The file ‘WeatherDownloads_202005_v002.Rmd’ contains code for dowloading and processing historical weather data as contained in METAR archives hosted by Iowa State University.
Data have been dowloaded and processed for several stations (airports) and years, with .rds files saved in “./RInputFiles/ProcessedMETAR”.
This module will perform exploratory data analysis on the processed weather files.
Each processed data file contains one year of hourly weather data for one station. Files are saved as ‘./RInputFiles/ProcessedMETAR/metar_kxxx_yyyy.rds’ where xxx is the three-digit airport code and yyyy is the four-digit year.
Each file contains the following variables:
There are several functions available for analysis:
plotCountsByMetric() - bar plots for counts by variable
plotNumCor() - plot two numeric variables against each other
plotFactorNumeric() - boxplot a numeric variable against a factor variable
corMETAR() - correlations between METAR variables
lmMETAR() - linear regression modeling for METAR variables
basicWindPlots() - plot wind speed and direction
getWindDirGroup() - convert wind direction to a grouping (e.g., N for 320-360-40)
consolidatePlotWind() - show frequency plots of wind direction, city, and month
The tidyverse library is loaded and the 2016 Detroit data is read in to show examples of the functions:
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.4
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ---------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
kdtw_2016 <- readRDS("./RInputFiles/ProcessedMETAR/metar_kdtw_2016.rds")
A variable mapping is created to allow for better readable variable names:
varMapper <- c(WindDir="Wind Direction (degrees)",
predomDir="General Prevailing Wind Direction",
WindSpeed="Wind Speed (kts)",
WindSpeed5="Wind Speed (kts), rounded to nearest 5 knots",
Visibility="Visibility (SM)",
TempC="Temperature (C)",
DewC="Dew Point (C)",
Altimeter="Altimeter (inches Hg)",
Altimeter10="Altimeter (inches Hg), rounded to nearest 0.1 inHg",
modSLP="Sea-Level Pressure (hPa)",
TempF="Temperature (F)",
DewF="Dew Point (F)",
TempF5="Temperature (F), rounded to nearest 5 degrees",
DewF5="Dew Point (F), rounded to nearest 5 degrees",
cType1="First Cloud Layer Type",
cLevel1="First Cloud Layer Height (ft)",
month="Month",
year="Year",
wType="Greatest Sky Obscuration",
day="Day of Month"
)
The function plotCountsByMetric() produces bar plots for counts by variable:
# Helper function for generating plots by key variables
plotcountsByMetric <- function(df,
mets,
title="",
rotateOn=20,
dropNA=TRUE,
diagnose=FALSE,
mapper=varMapper,
facetOn=NULL,
showCentral=FALSE
) {
# Function arguments
# df: dataframe or tibble containing raw data
# mets: character vector of variables for plotting counts
# title: character vector for plot title
# rotateOn: integer, x-axis labels will be rotated by 90 degrees if # categories >= rotateOn
# dropNA: boolean for whether to drop all NA prior to plotting (recommended for avoiding warnings)
# diagnose: boolean for whether to note in the log the number of NA observations dropped
# mapper: named list containing mapping from variable name to well-formatted name for titles and axes
# facetOn: a facetting variable for the supplied df (NULL for no faceting)
# showCentral: boolean for whether to show the central tendency over-plotted on the main data
# Function usage
# 1. By default, the function plots overall counts by metric for a given input
# 2. If facetOn is passed as a non-NULL, then the data in #1 will be facetted by facetOn
# 3. If showCentral=TRUE, then the overall mean will be plotted as a point on the main plot (only makes sense if facetOn has been selected)
# Plot of counts by key metric
for (x in mets) {
# If a facetting variable is provided, need to include this in the group_by
useVars <- x
if (!is.null(facetOn)) { useVars <- c(facetOn, useVars) }
dat <- df %>%
group_by_at(vars(all_of(useVars))) %>%
summarize(n=n())
if (dropNA) {
nOrig <- nrow(dat)
sumOrig <- sum(dat$n)
dat <- dat %>%
filter_all(all_vars(!is.na(.)))
if (diagnose & (nOrig > nrow(dat))) {
cat("\nDropping",
nOrig-nrow(dat),
"rows with",
sumOrig-sum(dat$n),
"observations due to NA\n"
)
}
}
# Create the main plot
p <- dat %>%
ggplot(aes_string(x=x, y="n")) +
geom_col() +
labs(title=title,
subtitle=paste0("Counts By: ", mapper[x]),
x=paste0(x, " - ", mapper[x]),
y="Count"
)
# If the rotateOn criteria is exceeded, rotate the x-axis by 90 degrees
if (nrow(dat) >= rotateOn) {
p <- p + theme(axis.text.x=element_text(angle=90))
}
# If facetting has been requested, facet by the desired variable
if (!is.null(facetOn)) {
p <- p + facet_wrap(as.formula(paste("~", facetOn)))
}
# If showCentral=TRUE, add a dot plot for the overall average
if (showCentral) {
# Get the median number of observations by facet, or the total if facetOn=NULL
if (is.null(facetOn)) {
useN <- sum(dat$n)
} else {
useN <- dat %>%
group_by_at(vars(all_of(facetOn))) %>%
summarize(n=sum(n)) %>%
pull(n) %>%
median()
}
# Get the overall percentages by x
centralData <- helperCountsByMetric(tbl=dat, ctVar=x, sumOn="n") %>%
mutate(centralValue=nPct*useN)
# Apply the median
p <- p + geom_point(data=centralData, aes(y=centralValue), color="red", size=2)
}
# Print the plot
print(p)
}
}
# Example for Detroit 2016 - using WindDir, cType1, month, wType
plotcountsByMetric(kdtw_2016,
mets=c("WindDir", "cType1", "month", "wType"),
title="Detroit, MI (2016)"
)
The function plotNumCor() plots two numeric variables against one another:
# Create a function for plotting two variables against each other
plotNumCor <- function(met,
var1,
var2,
title=NULL,
subT="",
dropNA=TRUE,
diagnose=FALSE,
mapper=varMapper,
facetOn=NULL,
showCentral=FALSE
) {
# Function arguments
# met: dataframe or tibble containing raw data
# var1: character vector of variable to be used for the x-axis
# var2: character vector of variable to be used for the y-axis
# title: character vector for plot title
# subT: character vector for plot subtitle
# dropNA: boolean for whether to drop all NA prior to plotting (recommended for avoiding warnings)
# diagnose: boolean for whether to note in the log the number of NA observations dropped
# mapper: named list containing mapping from variable name to well-formatted name for titles and axes
# facetOn: a facetting variable for the supplied met (NULL for no faceting)
# showCentral: boolean for whether to show the central tendency over-plotted on the main data
# Function usage
# 1. By default, the function plots overall counts by the provided x/y metrics, with each point sized based on the number of observations, and with an lm smooth overlaid
# 2. If facetOn is passed as a non-NULL, then the data in #1 will be facetted by facetOn
# 3. If showCentral=TRUE, then the lm smooth that best first to the overall data will be plotted (only makes sense if facetOn has been selected)
# Create the title if not passed
if (is.null(title)) {
title <- paste0("Hourly Observations of ", mapper[var1], " and ", mapper[var2])
}
# If a facetting variable is provided, need to include this in the group_by
useVars <- c(var1, var2)
if (!is.null(facetOn)) { useVars <- c(facetOn, useVars) }
# Pull the counts by useVars
dat <- met %>%
group_by_at(vars(all_of(useVars))) %>%
summarize(n=n())
# If NA requested to be excluded, remove anything with NA
if (dropNA) {
nOrig <- nrow(dat)
sumOrig <- sum(dat$n)
dat <- dat %>%
filter_all(all_vars(!is.na(.)))
if (diagnose) {
cat("\nDropping",
nOrig-nrow(dat),
"rows with",
sumOrig-sum(dat$n),
"observations due to NA\n"
)
}
}
p <- dat %>%
ggplot(aes_string(x=var1, y=var2)) +
geom_point(alpha=0.5, aes_string(size="n")) +
geom_smooth(method="lm", aes_string(weight="n")) +
labs(x=paste0(mapper[var1], " - ", var1),
y=paste0(mapper[var2], " - ", var2),
title=title,
subtitle=subT
)
# If facetting has been requested, facet by the desired variable
if (!is.null(facetOn)) {
p <- p + facet_wrap(as.formula(paste("~", facetOn)))
}
# If showCentral=TRUE, add a dashed line for the overall data
if (showCentral) {
p <- p + helperNumCor(dat, xVar=var1, yVar=var2, sumOn="n")
}
print(p)
}
# Example for Detroit 2016 - using TempC and TempF
plotNumCor(kdtw_2016, var1="TempC", var2="TempF", subT="Detroit, MI (2016)", diagnose=TRUE)
##
## Dropping 1 rows with 49 observations due to NA
# Example for Detroit 2016 - using TempC and DewC
plotNumCor(kdtw_2016, var1="TempC", var2="DewC", subT="Detroit, MI (2016)", diagnose=TRUE)
##
## Dropping 1 rows with 49 observations due to NA
# Example for Detroit 2016 - using Altimeter and modSLP
plotNumCor(kdtw_2016, var1="Altimeter", var2="modSLP", subT="Detroit, MI (2016)", diagnose=TRUE)
##
## Dropping 1 rows with 49 observations due to NA
The function plotFactorNumeric() creates box plots for a numeric variable against a factor variable:
# Updated function for plotting numeric by factor
plotFactorNumeric <- function(met,
fctVar,
numVar,
title=NULL,
subT="",
diagnose=TRUE,
showXLabel=TRUE,
mapper=varMapper,
facetOn=NULL,
showCentral=FALSE
) {
# Function arguments
# met: dataframe or tibble containing raw data
# fctVar: character vector of variable to be used for the x-axis (factor in the boxplot)
# numVar: character vector of variable to be used for the y-axis (numeric in the boxplot)
# title: character vector for plot title
# subT: character vector for plot subtitle
# diagnose: boolean for whether to note in the log the number of NA observations dropped
# showXLabel: boolean for whether to include the x-label (e.g., set to FALSE if using 'month')
# mapper: named list containing mapping from variable name to well-formatted name for titles and axes
# facetOn: a facetting variable for the supplied met (NULL for no faceting)
# showCentral: boolean for whether to show the central tendency over-plotted on the main data
# Function usage
# 1. By default, the function creates the boxplot of numVar by fctVar
# 2. If facetOn is passed as a non-NULL, then the data in #1 will be facetted by facetOn
# 3. If showCentral=TRUE, then the overall median of numVar by fctVar will be plotted as a red dot
# Create the title if not passed
if (is.null(title)) {
title <- paste0("Hourly Observations of ", mapper[numVar], " by ", mapper[fctVar])
}
# Remove the NA variables
nOrig <- nrow(met)
dat <- met %>%
filter(!is.na(get(fctVar)), !is.na(get(numVar)))
if (diagnose) { cat("\nRemoving", nOrig-nrow(dat), "records due to NA\n") }
# Create the base plot
p <- dat %>%
ggplot(aes_string(x=fctVar, y=numVar)) +
geom_boxplot(fill="lightblue") +
labs(title=title,
subtitle=subT,
x=ifelse(showXLabel, paste0(mapper[fctVar], " - ", fctVar), ""),
y=paste0(mapper[numVar], " - ", numVar)
)
# If facetting has been requested, facet by the desired variable
if (!is.null(facetOn)) {
p <- p + facet_wrap(as.formula(paste("~", facetOn)))
}
# If showCentral=TRUE, add a dot plot for the overall average
if (showCentral) {
centData <- helperFactorNumeric(dat, .f=median, byVar=fctVar, numVar=numVar)
p <- p + geom_point(data=centData, aes(y=helpFN), size=2, color="red")
}
# Render the final plot
print(p)
}
# Example for Detroit 2016 - using TempF and month
plotFactorNumeric(kdtw_2016,
fctVar="month",
numVar="TempF",
subT="Detroit, MI (2016)",
showXLabel=FALSE,
diagnose=TRUE
)
##
## Removing 49 records due to NA
# Example for Detroit 2016 - using WindSpeed and wType
plotFactorNumeric(kdtw_2016,
fctVar="wType",
numVar="WindSpeed",
subT="Detroit, MI (2016)",
showXLabel=TRUE,
diagnose=TRUE
)
##
## Removing 49 records due to NA
# Example for Detroit 2016 - using Visibility and wType
plotFactorNumeric(kdtw_2016,
fctVar="wType",
numVar="Visibility",
subT="Detroit, MI (2016)",
showXLabel=TRUE,
diagnose=TRUE
)
##
## Removing 49 records due to NA
An issue previous observed where visibility 1/16SM was interpreted as 16 statutory miles has been corrected in the ‘WeatherDownloads_202005_v002’ file.
The function corMETAR() calculates correlations among numeric variables in the METAR data:
# Function to calculate, display, and plot variable correlations
corMETAR <- function(met, numVars, subT="") {
# Keep only complete cases and report on data kept
dfUse <- met %>%
select_at(vars(all_of(numVars))) %>%
filter(complete.cases(.))
nU <- nrow(dfUse)
nM <- nrow(met)
myPct <- round(100*nU/nM, 1)
cat("\n *** Correlations use ", nU, " complete cases (", myPct, "% of ", nM, " total) ***\n", sep="")
# Create the correlation matrix
mtxCorr <- dfUse %>%
cor()
# Print the correlations
mtxCorr %>%
round(2) %>%
print()
# Display a heat map
corrplot::corrplot(mtxCorr,
method="color",
title=paste0("Hourly Weather Correlations\n", subT),
mar=c(0, 0, 2, 0)
)
}
# Example for Detroit, MI 2016
coreNum <- c("TempC", "TempF", "DewC", "DewF",
"Altimeter", "modSLP", "WindSpeed", "Visibility"
)
corMETAR(kdtw_2016, numVars=coreNum, subT="Detroit, MI (2016) METAR")
##
## *** Correlations use 8769 complete cases (99.4% of 8818 total) ***
## TempC TempF DewC DewF Altimeter modSLP WindSpeed Visibility
## TempC 1.00 1.00 0.92 0.92 -0.17 -0.24 -0.10 0.14
## TempF 1.00 1.00 0.92 0.92 -0.17 -0.24 -0.10 0.14
## DewC 0.92 0.92 1.00 1.00 -0.22 -0.28 -0.18 -0.07
## DewF 0.92 0.92 1.00 1.00 -0.22 -0.28 -0.18 -0.07
## Altimeter -0.17 -0.17 -0.22 -0.22 1.00 1.00 -0.37 0.15
## modSLP -0.24 -0.24 -0.28 -0.28 1.00 1.00 -0.35 0.14
## WindSpeed -0.10 -0.10 -0.18 -0.18 -0.37 -0.35 1.00 0.10
## Visibility 0.14 0.14 -0.07 -0.07 0.15 0.14 0.10 1.00
The function lmMETAR() runs simple linear regression models on the METAR data:
# Function for linear regressions on METAR data
lmMETAR <- function(met,
y,
x,
yName,
subT=""
) {
# Convert to formula
myChar <- paste0(y, " ~ ", x)
cat("\n *** Regression call is:", myChar, "***\n")
# Run regression
regr <- lm(formula(myChar), data=met)
# Summarize regression
print(summary(regr))
# Predict the new values
pred <- predict(regr, newdata=met)
# Plot the predictions
p <- met %>%
select_at(vars(all_of(y))) %>%
mutate(pred=pred) %>%
filter_all(all_vars(!is.na(.))) %>%
group_by_at(vars(all_of(c(y, "pred")))) %>%
summarize(n=n()) %>%
ggplot(aes_string(x=y, y="pred")) +
geom_point(aes(size=n), alpha=0.25) +
geom_smooth(aes(weight=n), method="lm") +
labs(title=paste0("Predicted vs. Actual ", yName, " - ", x, " as Predictor"),
subtitle=subT,
x=paste0("Actual ", yName),
y=paste0("Predicted ", yName)
)
print(p)
}
# Examples for Detroit, MI 2016
lmMETAR(kdtw_2016, "modSLP", "Altimeter", yName="Sea Level Pressure", subT="Detroit, MI (2016)")
##
## *** Regression call is: modSLP ~ Altimeter ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.96361 -0.44022 -0.03758 0.40772 1.41448
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -16.44389 0.74723 -22.01 <2e-16 ***
## Altimeter 34.41514 0.02488 1383.17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4961 on 8767 degrees of freedom
## (49 observations deleted due to missingness)
## Multiple R-squared: 0.9954, Adjusted R-squared: 0.9954
## F-statistic: 1.913e+06 on 1 and 8767 DF, p-value: < 2.2e-16
lmMETAR(kdtw_2016, "modSLP", "Altimeter + TempF", yName="Sea Level Pressure", subT="Detroit, MI (2016)")
##
## *** Regression call is: modSLP ~ Altimeter + TempF ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.57802 -0.12674 0.00481 0.12820 0.65708
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.713e+00 2.803e-01 -13.25 <2e-16 ***
## Altimeter 3.403e+01 9.302e-03 3658.80 <2e-16 ***
## TempF -2.351e-02 9.941e-05 -236.48 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1826 on 8766 degrees of freedom
## (49 observations deleted due to missingness)
## Multiple R-squared: 0.9994, Adjusted R-squared: 0.9994
## F-statistic: 7.086e+06 on 2 and 8766 DF, p-value: < 2.2e-16
lmMETAR(kdtw_2016, "TempC", "DewC", yName="Temperature (C)", subT="Detroit, MI (2016)")
##
## *** Regression call is: TempC ~ DewC ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.174 -3.172 -1.165 2.822 17.828
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.17084 0.05373 114.9 <2e-16 ***
## DewC 0.99909 0.00470 212.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.463 on 8767 degrees of freedom
## (49 observations deleted due to missingness)
## Multiple R-squared: 0.8375, Adjusted R-squared: 0.8375
## F-statistic: 4.519e+04 on 1 and 8767 DF, p-value: < 2.2e-16
The basicWindPlots() function creates plots for wind speed and direction:
# Generate basic wind plots
basicWindPlots <- function(met,
dirVar="WindDir",
spdVar="WindSpeed",
desc="",
gran="",
mapper=varMapper
) {
# Plot for the wind direction
wDir <- met %>%
ggplot(aes_string(x=dirVar)) +
geom_bar() +
labs(title=paste0(desc, " Wind Direction"), subtitle=gran,
y="# Hourly Observations", x=mapper[dirVar]
) +
theme(axis.text.x=element_text(angle=90))
print(wDir)
# Plot for the minimum, average, and maximum wind speed by wind direction
# Wind direction 000 is reserved for 0 KT wind, while VRB is reserved for 3-6 KT variable winds
wSpeedByDir <- met %>%
filter(!is.na(get(dirVar))) %>%
group_by_at(vars(all_of(dirVar))) %>%
summarize(minWind=min(get(spdVar)), meanWind=mean(get(spdVar)), maxWind=max(get(spdVar))) %>%
ggplot(aes_string(x=dirVar)) +
geom_point(aes(y=meanWind), color="red", size=2) +
geom_errorbar(aes(ymin=minWind, ymax=maxWind)) +
labs(title=paste0(desc, " Wind Speed (Max, Mean, Min) By Wind Direction"),
subtitle=gran,
y=mapper[spdVar],
x=mapper[dirVar]
) +
theme(axis.text.x=element_text(angle=90))
print(wSpeedByDir)
# Plot for the wind speed
pctZero <- sum(pull(met, spdVar)==0, na.rm=TRUE) / nrow(met)
wSpeed <- met %>%
filter_at(vars(all_of(spdVar)), all_vars(!is.na(.))) %>%
ggplot(aes_string(x=spdVar)) +
geom_bar(aes(y=..count../sum(..count..))) +
labs(title=paste0(round(100*pctZero), "% of wind speeds in ", desc, " measure 0 Knots"),
subtitle=gran,
y="% Hourly Observations",
x=mapper[spdVar]
)
print(wSpeed)
# Polar plot for wind speed and wind direction
wData <- met %>%
filter_at(vars(all_of(dirVar)), all_vars(!is.na(.) & !(. %in% c("000", "VRB")))) %>%
filter_at(vars(all_of(spdVar)), all_vars(!is.na(.))) %>%
mutate_at(vars(all_of(dirVar)), as.numeric) %>%
group_by_at(vars(all_of(c(dirVar, spdVar)))) %>%
summarize(n=n())
wPolarDirSpeed <- wData %>%
ggplot(aes_string(x=spdVar, y=dirVar)) +
geom_point(alpha=0.1, aes(size=n)) +
coord_polar(theta="y") +
labs(title=paste0(desc, " Direction vs. Wind Speed"),
subtitle=gran,
x=mapper[spdVar],
y=mapper[dirVar]
) +
scale_y_continuous(limits=c(0, 360), breaks=c(0, 90, 180, 270, 360)) +
scale_x_continuous(limits=c(0, 40), breaks=c(0, 5, 10, 15, 20, 25, 30, 35, 40)) +
geom_point(aes(x=0, y=0), color="red", size=2)
print(wPolarDirSpeed)
}
# Example for Detroit, MI 2016
basicWindPlots(kdtw_2016, desc="Detroit, MI (2016)", gran="KDTW METAR")
The getWindDirGroup() function maps wind direction to a category such as NNE. Because the METAR data are recorded in units of 10 degrees, either 4 groupings (90 degrees each) or 12 groupings (30 degrees each) are preferred, so that each category has the same underlying number of buckets:
# Extract the wind direction data from a processed METAR file
getWindDirGroup <- function(met, src) {
# Use the fullMETAR data and extract WindDir, WindSpeed, month
windPlotData <- met %>%
select(WindDir, WindSpeed, month) %>%
mutate(windDirGroup=factor(case_when(WindSpeed==0 ~ "No Wind",
WindDir=="VRB" ~ "Variable",
WindDir %in% c("350", "360", "010") ~ "N",
WindDir %in% c("020", "030", "040") ~ "NNE",
WindDir %in% c("050", "060", "070") ~ "ENE",
WindDir %in% c("080", "090", "100") ~ "E",
WindDir %in% c("110", "120", "130") ~ "ESE",
WindDir %in% c("140", "150", "160") ~ "SSE",
WindDir %in% c("170", "180", "190") ~ "S",
WindDir %in% c("200", "210", "220") ~ "SSW",
WindDir %in% c("230", "240", "250") ~ "WSW",
WindDir %in% c("260", "270", "280") ~ "W",
WindDir %in% c("290", "300", "310") ~ "WNW",
WindDir %in% c("320", "330", "340") ~ "NNW",
TRUE ~ "Error"
),
levels=c("No Wind", "Variable", "Error",
"N", "NNE", "ENE",
"E", "ESE", "SSE",
"S", "SSW", "WSW",
"W", "WNW", "NNW"
)
)
)
# Rempve the errors and calculate percentages by month for the remainder
processedWindData <- windPlotData %>%
filter(windDirGroup != "Error") %>%
group_by(month, windDirGroup) %>%
summarize(n=n()) %>%
ungroup() %>%
group_by(month) %>%
mutate(pct=n/sum(n)) %>%
ungroup() %>%
mutate(src=src)
processedWindData
}
The function conslidatePlotWind() then calls getWindDirGroup() for any number of files:
# Consolidate and plot wind data
consolidatePlotWind <- function(files, names) {
consFun <- function(x, y) { getWindDirGroup(met=x, src=y) }
boundByRows <- map2_dfr(.x=files, .y=names, .f=consFun)
# Show frequency by month for each city, faceted by wind direction
p1 <- boundByRows %>%
ggplot(aes(x=month, y=pct, color=src)) +
geom_line(aes(group=src)) +
facet_wrap(~windDirGroup) +
labs(title="Wind Frequency by Month",
x="Month",
y="Frequency of Wind Observations"
) +
theme(axis.text.x=element_text(angle=90))
print(p1)
# Show frequency by wind direction for each city, faceted by month
p2 <- boundByRows %>%
ggplot(aes(x=windDirGroup, y=pct, color=src)) +
geom_line(aes(group=src)) +
facet_wrap(~month) +
labs(title="Wind Frequency by Wind Direction",
x="Wind Direction",
y="Frequency of Wind Observations"
) +
theme(axis.text.x=element_text(angle=90))
print(p2)
boundByRows
}
# Load the Las Vegas data and New Orleans data for comparison
kmsy_2016 <- readRDS("./RInputFiles/ProcessedMETAR/metar_kmsy_2016.rds")
klas_2016 <- readRDS("./RInputFiles/ProcessedMETAR/metar_klas_2016.rds")
# Run wind by month comparisons for Detroit, Las Vegas, New Orleans
consolidatePlotWind(files=list(kdtw_2016, klas_2016, kmsy_2016),
names=c("Detroit, MI (2016)", "Las Vegas, NV (2016)", "New Orleans, LA (2016)")
)
## # A tibble: 504 x 5
## month windDirGroup n pct src
## <fct> <fct> <int> <dbl> <chr>
## 1 Jan No Wind 46 0.0600 Detroit, MI (2016)
## 2 Jan Variable 3 0.00391 Detroit, MI (2016)
## 3 Jan N 26 0.0339 Detroit, MI (2016)
## 4 Jan NNE 26 0.0339 Detroit, MI (2016)
## 5 Jan ENE 9 0.0117 Detroit, MI (2016)
## 6 Jan E 16 0.0209 Detroit, MI (2016)
## 7 Jan ESE 14 0.0183 Detroit, MI (2016)
## 8 Jan SSE 72 0.0939 Detroit, MI (2016)
## 9 Jan S 131 0.171 Detroit, MI (2016)
## 10 Jan SSW 158 0.206 Detroit, MI (2016)
## # ... with 494 more rows
The functions can be combined so that a full process can be run for a given file:
# File name to city name mapper
cityNameMapper <- c(kdtw_2016="Detroit, MI (2016)",
kewr_2016="Newark, NJ (2016)",
kgrb_2016="Green Bay, WI (2016)",
kgrr_2016="Grand Rapids, MI (2016)",
kiah_2016="Houston, TX (2016)",
kind_2016="Indianapolis, IN (2016)",
klas_2015="Las Vegas, NV (2015)",
klas_2016="Las Vegas, NV (2016)",
klas_2017="Las Vegas, NV (2017)",
klnk_2016="Lincoln, NE (2016)",
kmke_2016="Milwaukee, WI (2016)",
kmsn_2016="Madison, WI (2016)",
kmsp_2016="Minneapolis, MN (2016)",
kmsy_2015="New Orleans, LA (2015)",
kmsy_2016="New Orleans, LA (2016)",
kmsy_2017="New Orleans, LA (2017)",
kord_2015="Chicago, IL (2015)",
kord_2016="Chicago, IL (2016)",
kord_2017="Chicago, IL (2017)",
ksan_2015="San Diego, CA (2015)",
ksan_2016="San Diego, CA (2016)",
ksan_2017="San Diego, CA (2017)",
ktvc_2016="Traverse City, MI (2016)"
)
# This is a helper function to create a locale description
getLocaleDescription <- function(x, mapper=cityNameMapper) {
# Initialize the description as NULL
desc <- NULL
for (potMatch in names(mapper)) {
if (str_detect(string=x, pattern=potMatch)) {
desc <- mapper[potMatch]
break
}
}
# If the mapping failed, use UNMAPPED_x as the description
if (is.null(desc)) {
desc <- paste0("UNMAPPED_", x)
cat("\nUnable to find a description, will use ", desc, "\n\n", sep="")
} else {
cat("\nWill use ", desc, " as the description for ", x, "\n\n", sep="")
}
# Return the descriptive name
desc
}
# The following function runs the functions that work on a single data source
combinedEDA <- function(filename=NULL,
tbl=NULL,
desc=NULL,
mets=c("WindDir", "WindSpeed", "TempC", "DewC", "Altimeter",
"modSLP", "cType1", "cLevel1", "month", "day"
),
corPairs=list(c("TempC", "TempF"),
c("TempC", "DewC"),
c("Altimeter", "modSLP"),
c("Altimeter", "WindSpeed")
),
fctPairs=list(c("month", "TempF"),
c("month", "DewF"),
c("month", "WindSpeed"),
c("month", "Altimeter"),
c("wType", "Visibility"),
c("wType", "WindSpeed"),
c("WindDir", "WindSpeed"),
c("WindDir", "TempF")
),
heatVars=c("TempC", "TempF",
"DewC", "DewF",
"Altimeter", "modSLP",
"WindSpeed", "Visibility",
"monthint", "day"
),
lmVars=list(c("modSLP", "Altimeter"),
c("modSLP", "Altimeter + TempF"),
c("TempF", "DewF"),
c("WindSpeed", "Altimeter + TempF + DewF + month")
),
mapVariables=varMapper,
mapFileNames=cityNameMapper,
path="./RInputFiles/ProcessedMETAR/"
) {
# Require that either filename OR tbl be passed
if (is.null(filename) & is.null(tbl)) {
cat("\nMust provide either a filename or an already-loaded tibble\n")
stop("\nfilename=NULL and tbl=NULL may not both be passed to combinedEDA()\n")
}
# Require that either 1) filename and mapFileNames, OR 2) desc be passed
if ((is.null(filename) | is.null(mapFileNames)) & is.null(desc)) {
cat("\nMust provide either a filename with mapFileNames or a file description\n")
stop("\nWhen desc=NULL must have non-null entries for both filename= and mapFileNames=\n")
}
# Find the description if it is NULL (default)
if (is.null(desc)) {
desc <- getLocaleDescription(filename, mapper=mapFileNames)
}
# Warn if both filename and tbl are passed, since tbl will be used
if (!is.null(filename) & !is.null(tbl)) {
cat("\nA tibble has been passed and will be used as the dataset for this function\n")
warning("\nArgument filename=", filename, " is NOT loaded since a tibble was passed\n")
}
# Read in the file unless tbl has already been passed to the routine
if (is.null(tbl)) {
tbl <- readRDS(paste0(path, filename))
}
# Plot counts by metric
plotcountsByMetric(tbl, mets=mets, title=desc, diagnose=TRUE)
# Plot relationships between two variables
for (ys in corPairs) {
plotNumCor(tbl, var1=ys[1], var2=ys[2], subT=desc, diagnose=TRUE)
}
# plot numeric vs. factor
for (ys in fctPairs) {
plotFactorNumeric(tbl, fctVar=ys[1], numVar=ys[2], subT=desc, showXLabel=FALSE, diagnose=TRUE)
}
# Heatmap for variable correlations
corMETAR(tbl, numVars=heatVars, subT=paste0(desc, " METAR"))
# Run linear rergression
for (ys in lmVars) {
lmMETAR(tbl, y=ys[1], x=ys[2], yName=varMapper[ys[1]], subT=desc)
}
# Run the basic wind plots
basicWindPlots(tbl, desc=desc, gran="")
# Return the tibble
tbl
}
The process can then be run for each of Detroit (2016), Las Vegas (2016), and New Orleans (2016):
# Retrieve the Detroit, MI (2016) data
kdtw_2016 <- combinedEDA("metar_kdtw_2016.rds")
##
## Will use Detroit, MI (2016) as the description for metar_kdtw_2016.rds
##
##
## Dropping 1 rows with 49 observations due to NA
##
## Dropping 1 rows with 49 observations due to NA
##
## Dropping 1 rows with 49 observations due to NA
##
## Dropping 1 rows with 49 observations due to NA
##
## Dropping 1 rows with 49 observations due to NA
##
## Dropping 1 rows with 49 observations due to NA
##
## Dropping 1 rows with 651 observations due to NA
##
## Dropping 1 rows with 49 observations due to NA
##
## Dropping 1 rows with 49 observations due to NA
##
## Dropping 1 rows with 49 observations due to NA
##
## Dropping 1 rows with 49 observations due to NA
##
## Removing 49 records due to NA
##
## Removing 49 records due to NA
##
## Removing 49 records due to NA
##
## Removing 49 records due to NA
##
## Removing 49 records due to NA
##
## Removing 49 records due to NA
##
## Removing 49 records due to NA
##
## Removing 49 records due to NA
##
## *** Correlations use 8769 complete cases (99.4% of 8818 total) ***
## TempC TempF DewC DewF Altimeter modSLP WindSpeed Visibility
## TempC 1.00 1.00 0.92 0.92 -0.17 -0.24 -0.10 0.14
## TempF 1.00 1.00 0.92 0.92 -0.17 -0.24 -0.10 0.14
## DewC 0.92 0.92 1.00 1.00 -0.22 -0.28 -0.18 -0.07
## DewF 0.92 0.92 1.00 1.00 -0.22 -0.28 -0.18 -0.07
## Altimeter -0.17 -0.17 -0.22 -0.22 1.00 1.00 -0.37 0.15
## modSLP -0.24 -0.24 -0.28 -0.28 1.00 1.00 -0.35 0.14
## WindSpeed -0.10 -0.10 -0.18 -0.18 -0.37 -0.35 1.00 0.10
## Visibility 0.14 0.14 -0.07 -0.07 0.15 0.14 0.10 1.00
## monthint 0.24 0.24 0.32 0.32 0.19 0.17 -0.06 -0.11
## day 0.04 0.04 0.03 0.03 -0.02 -0.02 0.06 0.04
## monthint day
## TempC 0.24 0.04
## TempF 0.24 0.04
## DewC 0.32 0.03
## DewF 0.32 0.03
## Altimeter 0.19 -0.02
## modSLP 0.17 -0.02
## WindSpeed -0.06 0.06
## Visibility -0.11 0.04
## monthint 1.00 0.02
## day 0.02 1.00
##
## *** Regression call is: modSLP ~ Altimeter ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.96361 -0.44022 -0.03758 0.40772 1.41448
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -16.44389 0.74723 -22.01 <2e-16 ***
## Altimeter 34.41514 0.02488 1383.17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4961 on 8767 degrees of freedom
## (49 observations deleted due to missingness)
## Multiple R-squared: 0.9954, Adjusted R-squared: 0.9954
## F-statistic: 1.913e+06 on 1 and 8767 DF, p-value: < 2.2e-16
##
## *** Regression call is: modSLP ~ Altimeter + TempF ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.57802 -0.12674 0.00481 0.12820 0.65708
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.713e+00 2.803e-01 -13.25 <2e-16 ***
## Altimeter 3.403e+01 9.302e-03 3658.80 <2e-16 ***
## TempF -2.351e-02 9.941e-05 -236.48 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1826 on 8766 degrees of freedom
## (49 observations deleted due to missingness)
## Multiple R-squared: 0.9994, Adjusted R-squared: 0.9994
## F-statistic: 7.086e+06 on 2 and 8766 DF, p-value: < 2.2e-16
##
## *** Regression call is: TempF ~ DewF ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.132 -6.065 -2.083 4.042 31.029
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.060663 0.211976 52.18 <2e-16 ***
## DewF 1.000977 0.004677 214.02 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.987 on 8767 degrees of freedom
## (49 observations deleted due to missingness)
## Multiple R-squared: 0.8393, Adjusted R-squared: 0.8393
## F-statistic: 4.58e+04 on 1 and 8767 DF, p-value: < 2.2e-16
##
## *** Regression call is: WindSpeed ~ Altimeter + TempF + DewF + month ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.3544 -2.5454 -0.1494 2.3669 20.6042
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 306.827720 6.726512 45.615 < 2e-16 ***
## Altimeter -9.980907 0.222840 -44.790 < 2e-16 ***
## TempF 0.206384 0.006044 34.145 < 2e-16 ***
## DewF -0.217533 0.006755 -32.204 < 2e-16 ***
## monthFeb 0.188758 0.204057 0.925 0.3550
## monthMar -1.375926 0.211244 -6.513 7.75e-11 ***
## monthApr -2.406187 0.217870 -11.044 < 2e-16 ***
## monthMay -3.616016 0.248432 -14.555 < 2e-16 ***
## monthJun -4.337974 0.277532 -15.631 < 2e-16 ***
## monthJul -3.428235 0.300872 -11.394 < 2e-16 ***
## monthAug -2.971582 0.312087 -9.522 < 2e-16 ***
## monthSep -1.839206 0.292614 -6.285 3.43e-10 ***
## monthOct -0.477868 0.255640 -1.869 0.0616 .
## monthNov -0.013008 0.226830 -0.057 0.9543
## monthDec 2.071836 0.200848 10.315 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.883 on 8754 degrees of freedom
## (49 observations deleted due to missingness)
## Multiple R-squared: 0.3218, Adjusted R-squared: 0.3207
## F-statistic: 296.7 on 14 and 8754 DF, p-value: < 2.2e-16
# Retrieve the Las Vegas, NV (2016) data
klas_2016 <- combinedEDA("metar_klas_2016.rds")
##
## Will use Las Vegas, NV (2016) as the description for metar_klas_2016.rds
##
##
## Dropping 1 rows with 35 observations due to NA
##
## Dropping 1 rows with 35 observations due to NA
##
## Dropping 1 rows with 35 observations due to NA
##
## Dropping 1 rows with 35 observations due to NA
##
## Dropping 1 rows with 35 observations due to NA
##
## Dropping 1 rows with 35 observations due to NA
##
## Dropping 1 rows with 2440 observations due to NA
##
## Dropping 1 rows with 35 observations due to NA
##
## Dropping 1 rows with 35 observations due to NA
##
## Dropping 1 rows with 35 observations due to NA
##
## Dropping 1 rows with 35 observations due to NA
##
## Removing 35 records due to NA
##
## Removing 35 records due to NA
##
## Removing 35 records due to NA
##
## Removing 35 records due to NA
##
## Removing 35 records due to NA
##
## Removing 35 records due to NA
##
## Removing 35 records due to NA
##
## Removing 35 records due to NA
##
## *** Correlations use 8783 complete cases (99.6% of 8818 total) ***
## TempC TempF DewC DewF Altimeter modSLP WindSpeed Visibility
## TempC 1.00 1.00 0.22 0.22 -0.51 -0.62 0.22 0.01
## TempF 1.00 1.00 0.22 0.22 -0.51 -0.62 0.22 0.01
## DewC 0.22 0.22 1.00 1.00 -0.24 -0.27 -0.04 -0.13
## DewF 0.22 0.22 1.00 1.00 -0.24 -0.27 -0.04 -0.13
## Altimeter -0.51 -0.51 -0.24 -0.24 1.00 0.99 -0.38 0.06
## modSLP -0.62 -0.62 -0.27 -0.27 0.99 1.00 -0.38 0.06
## WindSpeed 0.22 0.22 -0.04 -0.04 -0.38 -0.38 1.00 -0.02
## Visibility 0.01 0.01 -0.13 -0.13 0.06 0.06 -0.02 1.00
## monthint 0.14 0.14 0.06 0.06 0.06 0.03 -0.01 -0.02
## day -0.01 -0.01 0.03 0.03 -0.01 -0.01 0.00 -0.07
## monthint day
## TempC 0.14 -0.01
## TempF 0.14 -0.01
## DewC 0.06 0.03
## DewF 0.06 0.03
## Altimeter 0.06 -0.01
## modSLP 0.03 -0.01
## WindSpeed -0.01 0.00
## Visibility -0.02 -0.07
## monthint 1.00 0.02
## day 0.02 1.00
##
## *** Regression call is: modSLP ~ Altimeter ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1381 -0.7848 -0.0607 0.6891 3.6362
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -116.30984 1.72975 -67.24 <2e-16 ***
## Altimeter 37.68826 0.05776 652.51 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9753 on 8781 degrees of freedom
## (35 observations deleted due to missingness)
## Multiple R-squared: 0.9798, Adjusted R-squared: 0.9798
## F-statistic: 4.258e+05 on 1 and 8781 DF, p-value: < 2.2e-16
##
## *** Regression call is: modSLP ~ Altimeter + TempF ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.18161 -0.33113 0.02395 0.33395 1.08509
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.537e+01 8.693e-01 -29.18 <2e-16 ***
## Altimeter 3.478e+01 2.868e-02 1212.89 <2e-16 ***
## TempF -5.581e-02 2.813e-04 -198.42 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4165 on 8780 degrees of freedom
## (35 observations deleted due to missingness)
## Multiple R-squared: 0.9963, Adjusted R-squared: 0.9963
## F-statistic: 1.187e+06 on 2 and 8780 DF, p-value: < 2.2e-16
##
## *** Regression call is: TempF ~ DewF ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.026 -14.381 -0.341 13.299 46.312
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.12523 0.49251 126.14 <2e-16 ***
## DewF 0.32810 0.01576 20.82 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.94 on 8781 degrees of freedom
## (35 observations deleted due to missingness)
## Multiple R-squared: 0.04705, Adjusted R-squared: 0.04694
## F-statistic: 433.5 on 1 and 8781 DF, p-value: < 2.2e-16
##
## *** Regression call is: WindSpeed ~ Altimeter + TempF + DewF + month ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.1199 -2.7665 -0.6134 2.0975 22.2630
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 316.166903 9.583584 32.990 < 2e-16 ***
## Altimeter -10.363527 0.316290 -32.766 < 2e-16 ***
## TempF 0.012196 0.005239 2.328 0.019933 *
## DewF -0.053040 0.004107 -12.915 < 2e-16 ***
## monthFeb 1.477528 0.225729 6.546 6.26e-11 ***
## monthMar 0.848312 0.232319 3.652 0.000262 ***
## monthApr 1.590035 0.242532 6.556 5.84e-11 ***
## monthMay 0.327608 0.261841 1.251 0.210905
## monthJun 0.141204 0.319290 0.442 0.658325
## monthJul 1.092983 0.328362 3.329 0.000876 ***
## monthAug 0.545515 0.319165 1.709 0.087450 .
## monthSep 0.647423 0.281377 2.301 0.021420 *
## monthOct 1.147640 0.253286 4.531 5.95e-06 ***
## monthNov 1.097005 0.227198 4.828 1.40e-06 ***
## monthDec 0.672432 0.214810 3.130 0.001752 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.181 on 8768 degrees of freedom
## (35 observations deleted due to missingness)
## Multiple R-squared: 0.1774, Adjusted R-squared: 0.176
## F-statistic: 135 on 14 and 8768 DF, p-value: < 2.2e-16
# Retrieve the New Orleans, LA (2016) data
kmsy_2016 <- combinedEDA("metar_kmsy_2016.rds")
##
## Will use New Orleans, LA (2016) as the description for metar_kmsy_2016.rds
##
##
## Dropping 1 rows with 14 observations due to NA
##
## Dropping 1 rows with 14 observations due to NA
##
## Dropping 1 rows with 14 observations due to NA
##
## Dropping 1 rows with 14 observations due to NA
##
## Dropping 1 rows with 14 observations due to NA
##
## Dropping 1 rows with 14 observations due to NA
##
## Dropping 1 rows with 1029 observations due to NA
##
## Dropping 1 rows with 14 observations due to NA
##
## Dropping 1 rows with 14 observations due to NA
##
## Dropping 1 rows with 14 observations due to NA
##
## Dropping 1 rows with 14 observations due to NA
##
## Removing 14 records due to NA
##
## Removing 14 records due to NA
##
## Removing 14 records due to NA
##
## Removing 14 records due to NA
##
## Removing 14 records due to NA
##
## Removing 14 records due to NA
##
## Removing 14 records due to NA
##
## Removing 14 records due to NA
##
## *** Correlations use 8799 complete cases (99.8% of 8813 total) ***
## TempC TempF DewC DewF Altimeter modSLP WindSpeed Visibility
## TempC 1.00 1.00 0.85 0.85 -0.48 -0.47 -0.09 0.13
## TempF 1.00 1.00 0.85 0.85 -0.48 -0.48 -0.09 0.13
## DewC 0.85 0.85 1.00 1.00 -0.56 -0.56 -0.21 -0.06
## DewF 0.85 0.85 1.00 1.00 -0.57 -0.56 -0.21 -0.05
## Altimeter -0.48 -0.48 -0.56 -0.57 1.00 1.00 -0.08 0.08
## modSLP -0.47 -0.48 -0.56 -0.56 1.00 1.00 -0.08 0.08
## WindSpeed -0.09 -0.09 -0.21 -0.21 -0.08 -0.08 1.00 0.03
## Visibility 0.13 0.13 -0.06 -0.05 0.08 0.08 0.03 1.00
## monthint 0.26 0.26 0.31 0.31 0.09 0.09 -0.22 -0.02
## day 0.03 0.03 0.07 0.07 0.02 0.02 -0.05 0.01
## monthint day
## TempC 0.26 0.03
## TempF 0.26 0.03
## DewC 0.31 0.07
## DewF 0.31 0.07
## Altimeter 0.09 0.02
## modSLP 0.09 0.02
## WindSpeed -0.22 -0.05
## Visibility -0.02 0.01
## monthint 1.00 0.02
## day 0.02 1.00
##
## *** Regression call is: modSLP ~ Altimeter ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.232515 -0.083700 0.000544 0.084787 0.227320
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.773456 0.230067 12.05 <2e-16 ***
## Altimeter 33.779607 0.007654 4413.27 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1015 on 8797 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.9995, Adjusted R-squared: 0.9995
## F-statistic: 1.948e+07 on 1 and 8797 DF, p-value: < 2.2e-16
##
## *** Regression call is: modSLP ~ Altimeter + TempF ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.222066 -0.082359 -0.001036 0.083533 0.217460
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.557e+00 2.638e-01 5.903 3.7e-09 ***
## Altimeter 3.382e+01 8.667e-03 3902.029 < 2e-16 ***
## TempF 8.751e-04 9.432e-05 9.277 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.101 on 8796 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.9996, Adjusted R-squared: 0.9996
## F-statistic: 9.833e+06 on 2 and 8796 DF, p-value: < 2.2e-16
##
## *** Regression call is: TempF ~ DewF ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.051 -4.777 -1.265 4.199 26.477
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.440028 0.319858 76.41 <2e-16 ***
## DewF 0.778877 0.005065 153.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.763 on 8797 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.7288, Adjusted R-squared: 0.7288
## F-statistic: 2.364e+04 on 1 and 8797 DF, p-value: < 2.2e-16
##
## *** Regression call is: WindSpeed ~ Altimeter + TempF + DewF + month ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.9452 -2.5710 -0.2247 2.2592 18.9024
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 190.986029 11.812622 16.168 < 2e-16 ***
## Altimeter -6.223347 0.387357 -16.066 < 2e-16 ***
## TempF 0.225378 0.007290 30.915 < 2e-16 ***
## DewF -0.169883 0.006513 -26.082 < 2e-16 ***
## monthFeb -0.787460 0.204817 -3.845 0.000122 ***
## monthMar -0.270554 0.215758 -1.254 0.209885
## monthApr -1.941174 0.225642 -8.603 < 2e-16 ***
## monthMay -3.763407 0.241926 -15.556 < 2e-16 ***
## monthJun -5.215285 0.269852 -19.326 < 2e-16 ***
## monthJul -5.568010 0.283515 -19.639 < 2e-16 ***
## monthAug -4.834769 0.277122 -17.446 < 2e-16 ***
## monthSep -5.769099 0.272789 -21.149 < 2e-16 ***
## monthOct -4.812814 0.245841 -19.577 < 2e-16 ***
## monthNov -2.827680 0.220138 -12.845 < 2e-16 ***
## monthDec -0.203742 0.210552 -0.968 0.333241
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.847 on 8784 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.2278, Adjusted R-squared: 0.2266
## F-statistic: 185.1 on 14 and 8784 DF, p-value: < 2.2e-16
There is a lot of good information in the EDA, but it can be overwhelming to have everything in one place. Perhaps a wrapper function can be built allowing for outputs to be piped to a given file:
wrapCombinedEDA <- function(readFile,
readPath="./RInputFiles/ProcessedMETAR/",
mapFileNames=cityNameMapper,
desc=NULL,
writeLogFile=NULL,
writeLogPDF=NULL,
writeLogPath=NULL,
appendWriteFile=FALSE,
...
) {
# Read in the requested file
tbl <- readRDS(paste0(readPath, readFile))
# Find the description if it has not been passed
if (is.null(desc)) {
desc <- getLocaleDescription(readFile, mapper=mapFileNames)
}
# Helper function that only runs the combinedEDA() routine
coreFunc <- function() { combinedEDA(tbl=tbl, desc=desc, mapFileNames=mapFileNames, ...) }
# If writeLogPDF is not NULL, direct the graphs to a suitable PDF
if (!is.null(writeLogPDF)) {
# Prepend the provided log path if it has not been made available
if (!is.null(writeLogPath)) {
writeLogPDF <- paste0(writeLogPath, writeLogPDF)
}
# Provide the location of the EDA pdf file
cat("\nEDA PDF file is available at:", writeLogPDF, "\n")
# Redirect the writing to writeLogPDF
pdf(writeLogPDF)
}
# Run EDA on the tbl using capture.output to redirect to a log file if specified
if (!is.null(writeLogFile)) {
# Prepend the provided log path if it has not been made available
if (!is.null(writeLogPath)) {
writeLogFile <- paste0(writeLogPath, writeLogFile)
}
# Provide the location of the EDA log file
cat("\nEDA log file is available at:", writeLogFile, "\n")
# Run EDA such that the output goes to the log file
capture.output(coreFunc(),
file=writeLogFile,
append=appendWriteFile
)
} else {
# Run EDA such that output stays in stdout
coreFunc()
}
# If writeLogPDF is not NULL, redirect to stdout
if (!is.null(writeLogPDF)) {
dev.off()
}
# Return the tbl
tbl
}
filePath <- "./RInputFiles/ProcessedMETAR/"
# Example for the basic function for Chicago, IL (2016) written to stdout
kord_2016 <- wrapCombinedEDA("metar_kord_2016.rds", readPath=filePath)
##
## Will use Chicago, IL (2016) as the description for metar_kord_2016.rds
##
##
## Dropping 1 rows with 10 observations due to NA
##
## Dropping 1 rows with 10 observations due to NA
##
## Dropping 1 rows with 10 observations due to NA
##
## Dropping 1 rows with 10 observations due to NA
##
## Dropping 1 rows with 10 observations due to NA
##
## Dropping 1 rows with 10 observations due to NA
##
## Dropping 1 rows with 823 observations due to NA
##
## Dropping 1 rows with 10 observations due to NA
##
## Dropping 1 rows with 10 observations due to NA
##
## Dropping 1 rows with 10 observations due to NA
##
## Dropping 1 rows with 10 observations due to NA
##
## Removing 10 records due to NA
##
## Removing 10 records due to NA
##
## Removing 10 records due to NA
##
## Removing 10 records due to NA
##
## Removing 10 records due to NA
##
## Removing 10 records due to NA
##
## Removing 10 records due to NA
##
## Removing 10 records due to NA
##
## *** Correlations use 8805 complete cases (99.9% of 8815 total) ***
## TempC TempF DewC DewF Altimeter modSLP WindSpeed Visibility
## TempC 1.00 1.00 0.93 0.93 -0.22 -0.29 -0.12 0.19
## TempF 1.00 1.00 0.93 0.93 -0.22 -0.29 -0.12 0.19
## DewC 0.93 0.93 1.00 1.00 -0.27 -0.34 -0.19 0.05
## DewF 0.93 0.93 1.00 1.00 -0.28 -0.34 -0.19 0.05
## Altimeter -0.22 -0.22 -0.27 -0.28 1.00 1.00 -0.31 0.19
## modSLP -0.29 -0.29 -0.34 -0.34 1.00 1.00 -0.29 0.17
## WindSpeed -0.12 -0.12 -0.19 -0.19 -0.31 -0.29 1.00 0.01
## Visibility 0.19 0.19 0.05 0.05 0.19 0.17 0.01 1.00
## monthint 0.24 0.24 0.27 0.27 0.18 0.16 -0.09 0.02
## day 0.05 0.05 0.05 0.05 -0.06 -0.06 0.08 0.04
## monthint day
## TempC 0.24 0.05
## TempF 0.24 0.05
## DewC 0.27 0.05
## DewF 0.27 0.05
## Altimeter 0.18 -0.06
## modSLP 0.16 -0.06
## WindSpeed -0.09 0.08
## Visibility 0.02 0.04
## monthint 1.00 0.02
## day 0.02 1.00
##
## *** Regression call is: modSLP ~ Altimeter ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.31041 -0.47561 -0.07852 0.43888 1.66927
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -23.11596 0.85978 -26.89 <2e-16 ***
## Altimeter 34.63779 0.02864 1209.37 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5573 on 8803 degrees of freedom
## (10 observations deleted due to missingness)
## Multiple R-squared: 0.994, Adjusted R-squared: 0.994
## F-statistic: 1.463e+06 on 1 and 8803 DF, p-value: < 2.2e-16
##
## *** Regression call is: modSLP ~ Altimeter + TempF ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.69460 -0.12269 0.00055 0.12142 0.73628
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.603e+00 2.864e-01 -16.07 <2e-16 ***
## Altimeter 3.407e+01 9.502e-03 3585.30 <2e-16 ***
## TempF -2.606e-02 9.503e-05 -274.26 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1804 on 8802 degrees of freedom
## (10 observations deleted due to missingness)
## Multiple R-squared: 0.9994, Adjusted R-squared: 0.9994
## F-statistic: 7.018e+06 on 2 and 8802 DF, p-value: < 2.2e-16
##
## *** Regression call is: TempF ~ DewF ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.638 -5.485 -1.570 3.601 35.456
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.317877 0.188557 54.72 <2e-16 ***
## DewF 1.004504 0.004095 245.33 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.407 on 8803 degrees of freedom
## (10 observations deleted due to missingness)
## Multiple R-squared: 0.8724, Adjusted R-squared: 0.8724
## F-statistic: 6.019e+04 on 1 and 8803 DF, p-value: < 2.2e-16
##
## *** Regression call is: WindSpeed ~ Altimeter + TempF + DewF + month ***
##
## Call:
## lm(formula = formula(myChar), data = met)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.295 -2.635 -0.157 2.494 21.356
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 291.341063 7.156664 40.709 < 2e-16 ***
## Altimeter -9.415029 0.237118 -39.706 < 2e-16 ***
## TempF 0.160498 0.006445 24.904 < 2e-16 ***
## DewF -0.205407 0.006912 -29.718 < 2e-16 ***
## monthFeb 0.649436 0.210063 3.092 0.001997 **
## monthMar 0.033603 0.220448 0.152 0.878850
## monthApr 0.092196 0.230608 0.400 0.689317
## monthMay -0.859361 0.258300 -3.327 0.000882 ***
## monthJun -1.564544 0.296635 -5.274 1.36e-07 ***
## monthJul -1.191767 0.314437 -3.790 0.000152 ***
## monthAug -1.319054 0.319647 -4.127 3.72e-05 ***
## monthSep 0.394263 0.299972 1.314 0.188768
## monthOct 0.511074 0.264003 1.936 0.052916 .
## monthNov 0.050679 0.235873 0.215 0.829882
## monthDec 1.741405 0.205243 8.485 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.984 on 8790 degrees of freedom
## (10 observations deleted due to missingness)
## Multiple R-squared: 0.2494, Adjusted R-squared: 0.2482
## F-statistic: 208.6 on 14 and 8790 DF, p-value: < 2.2e-16
# Example for the basic function for San Diego, CA (2016) written to 'metar_ksan_2016_EDA.log'
ksan_2016 <- wrapCombinedEDA("metar_ksan_2016.rds",
readPath=filePath,
writeLogFile='metar_ksan_2016_EDA.log',
writeLogPDF='metar_ksan_2016_EDA.pdf',
writeLogPath=filePath
)
##
## Will use San Diego, CA (2016) as the description for metar_ksan_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2016_EDA.log
Finally, a function can be called to create the inputs to wrapCombinedEDA() for lower typing:
logAndPDFCombinedEDA <- function(tblName, filePath="./RInputFiles/ProcessedMETAR/") {
# Create the RDS file name
rdsName <- paste0("metar_", tblName, ".rds")
cat("\nRDS Name:", rdsName)
# Create the log file name
logName <- paste0("metar_", tblName, "_EDA.log")
cat("\nLog Name:", logName)
# Create the PDF file name
pdfName <- paste0("metar_", tblName, "_EDA.pdf")
cat("\nPDF Name:", pdfName)
# Call wrapCombinedEDA()
tbl <- wrapCombinedEDA(rdsName,
readPath=filePath,
writeLogFile=logName,
writeLogPDF=pdfName,
writeLogPath=filePath
)
# Return the tbl
tbl
}
This function can then be run for all of the relevant files (cached to avoid multiple runs):
# Run for 2016 only for kdtw, kewr, kgrb, kgrr, kiah, kind, klnk, kmke, kmsn, kmsp, ktvc
kdtw_2016 <- logAndPDFCombinedEDA("kdtw_2016")
##
## RDS Name: metar_kdtw_2016.rds
## Log Name: metar_kdtw_2016_EDA.log
## PDF Name: metar_kdtw_2016_EDA.pdf
## Will use Detroit, MI (2016) as the description for metar_kdtw_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kdtw_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kdtw_2016_EDA.log
kewr_2016 <- logAndPDFCombinedEDA("kewr_2016")
##
## RDS Name: metar_kewr_2016.rds
## Log Name: metar_kewr_2016_EDA.log
## PDF Name: metar_kewr_2016_EDA.pdf
## Will use Newark, NJ (2016) as the description for metar_kewr_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kewr_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kewr_2016_EDA.log
kgrb_2016 <- logAndPDFCombinedEDA("kgrb_2016")
##
## RDS Name: metar_kgrb_2016.rds
## Log Name: metar_kgrb_2016_EDA.log
## PDF Name: metar_kgrb_2016_EDA.pdf
## Will use Green Bay, WI (2016) as the description for metar_kgrb_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kgrb_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kgrb_2016_EDA.log
kgrr_2016 <- logAndPDFCombinedEDA("kgrr_2016")
##
## RDS Name: metar_kgrr_2016.rds
## Log Name: metar_kgrr_2016_EDA.log
## PDF Name: metar_kgrr_2016_EDA.pdf
## Will use Grand Rapids, MI (2016) as the description for metar_kgrr_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kgrr_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kgrr_2016_EDA.log
kiah_2016 <- logAndPDFCombinedEDA("kiah_2016")
##
## RDS Name: metar_kiah_2016.rds
## Log Name: metar_kiah_2016_EDA.log
## PDF Name: metar_kiah_2016_EDA.pdf
## Will use Houston, TX (2016) as the description for metar_kiah_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kiah_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kiah_2016_EDA.log
kind_2016 <- logAndPDFCombinedEDA("kind_2016")
##
## RDS Name: metar_kind_2016.rds
## Log Name: metar_kind_2016_EDA.log
## PDF Name: metar_kind_2016_EDA.pdf
## Will use Indianapolis, IN (2016) as the description for metar_kind_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kind_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kind_2016_EDA.log
klnk_2016 <- logAndPDFCombinedEDA("klnk_2016")
##
## RDS Name: metar_klnk_2016.rds
## Log Name: metar_klnk_2016_EDA.log
## PDF Name: metar_klnk_2016_EDA.pdf
## Will use Lincoln, NE (2016) as the description for metar_klnk_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_klnk_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_klnk_2016_EDA.log
kmke_2016 <- logAndPDFCombinedEDA("kmke_2016")
##
## RDS Name: metar_kmke_2016.rds
## Log Name: metar_kmke_2016_EDA.log
## PDF Name: metar_kmke_2016_EDA.pdf
## Will use Milwaukee, WI (2016) as the description for metar_kmke_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kmke_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kmke_2016_EDA.log
kmsn_2016 <- logAndPDFCombinedEDA("kmsn_2016")
##
## RDS Name: metar_kmsn_2016.rds
## Log Name: metar_kmsn_2016_EDA.log
## PDF Name: metar_kmsn_2016_EDA.pdf
## Will use Madison, WI (2016) as the description for metar_kmsn_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsn_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsn_2016_EDA.log
kmsp_2016 <- logAndPDFCombinedEDA("kmsp_2016")
##
## RDS Name: metar_kmsp_2016.rds
## Log Name: metar_kmsp_2016_EDA.log
## PDF Name: metar_kmsp_2016_EDA.pdf
## Will use Minneapolis, MN (2016) as the description for metar_kmsp_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsp_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsp_2016_EDA.log
ktvc_2016 <- logAndPDFCombinedEDA("ktvc_2016")
##
## RDS Name: metar_ktvc_2016.rds
## Log Name: metar_ktvc_2016_EDA.log
## PDF Name: metar_ktvc_2016_EDA.pdf
## Will use Traverse City, MI (2016) as the description for metar_ktvc_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_ktvc_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_ktvc_2016_EDA.log
# Run for 2015-2016-2017 for klas, kmsy, kord, ksan
klas_2015 <- logAndPDFCombinedEDA("klas_2015")
##
## RDS Name: metar_klas_2015.rds
## Log Name: metar_klas_2015_EDA.log
## PDF Name: metar_klas_2015_EDA.pdf
## Will use Las Vegas, NV (2015) as the description for metar_klas_2015.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_klas_2015_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_klas_2015_EDA.log
klas_2016 <- logAndPDFCombinedEDA("klas_2016")
##
## RDS Name: metar_klas_2016.rds
## Log Name: metar_klas_2016_EDA.log
## PDF Name: metar_klas_2016_EDA.pdf
## Will use Las Vegas, NV (2016) as the description for metar_klas_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_klas_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_klas_2016_EDA.log
klas_2017 <- logAndPDFCombinedEDA("klas_2017")
##
## RDS Name: metar_klas_2017.rds
## Log Name: metar_klas_2017_EDA.log
## PDF Name: metar_klas_2017_EDA.pdf
## Will use Las Vegas, NV (2017) as the description for metar_klas_2017.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_klas_2017_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_klas_2017_EDA.log
kmsy_2015 <- logAndPDFCombinedEDA("kmsy_2015")
##
## RDS Name: metar_kmsy_2015.rds
## Log Name: metar_kmsy_2015_EDA.log
## PDF Name: metar_kmsy_2015_EDA.pdf
## Will use New Orleans, LA (2015) as the description for metar_kmsy_2015.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsy_2015_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsy_2015_EDA.log
kmsy_2016 <- logAndPDFCombinedEDA("kmsy_2016")
##
## RDS Name: metar_kmsy_2016.rds
## Log Name: metar_kmsy_2016_EDA.log
## PDF Name: metar_kmsy_2016_EDA.pdf
## Will use New Orleans, LA (2016) as the description for metar_kmsy_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsy_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsy_2016_EDA.log
kmsy_2017 <- logAndPDFCombinedEDA("kmsy_2017")
##
## RDS Name: metar_kmsy_2017.rds
## Log Name: metar_kmsy_2017_EDA.log
## PDF Name: metar_kmsy_2017_EDA.pdf
## Will use New Orleans, LA (2017) as the description for metar_kmsy_2017.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsy_2017_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kmsy_2017_EDA.log
kord_2015 <- logAndPDFCombinedEDA("kord_2015")
##
## RDS Name: metar_kord_2015.rds
## Log Name: metar_kord_2015_EDA.log
## PDF Name: metar_kord_2015_EDA.pdf
## Will use Chicago, IL (2015) as the description for metar_kord_2015.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kord_2015_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kord_2015_EDA.log
kord_2016 <- logAndPDFCombinedEDA("kord_2016")
##
## RDS Name: metar_kord_2016.rds
## Log Name: metar_kord_2016_EDA.log
## PDF Name: metar_kord_2016_EDA.pdf
## Will use Chicago, IL (2016) as the description for metar_kord_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kord_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kord_2016_EDA.log
kord_2017 <- logAndPDFCombinedEDA("kord_2017")
##
## RDS Name: metar_kord_2017.rds
## Log Name: metar_kord_2017_EDA.log
## PDF Name: metar_kord_2017_EDA.pdf
## Will use Chicago, IL (2017) as the description for metar_kord_2017.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_kord_2017_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_kord_2017_EDA.log
ksan_2015 <- logAndPDFCombinedEDA("ksan_2015")
##
## RDS Name: metar_ksan_2015.rds
## Log Name: metar_ksan_2015_EDA.log
## PDF Name: metar_ksan_2015_EDA.pdf
## Will use San Diego, CA (2015) as the description for metar_ksan_2015.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2015_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2015_EDA.log
ksan_2016 <- logAndPDFCombinedEDA("ksan_2016")
##
## RDS Name: metar_ksan_2016.rds
## Log Name: metar_ksan_2016_EDA.log
## PDF Name: metar_ksan_2016_EDA.pdf
## Will use San Diego, CA (2016) as the description for metar_ksan_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2016_EDA.log
ksan_2017 <- logAndPDFCombinedEDA("ksan_2017")
##
## RDS Name: metar_ksan_2017.rds
## Log Name: metar_ksan_2017_EDA.log
## PDF Name: metar_ksan_2017_EDA.pdf
## Will use San Diego, CA (2017) as the description for metar_ksan_2017.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2017_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_ksan_2017_EDA.log
Reload Traverse City, MI (2016) due to a previous error (short-term fix to avoid re-running a full cache):
ktvc_2016 <- logAndPDFCombinedEDA("ktvc_2016")
##
## RDS Name: metar_ktvc_2016.rds
## Log Name: metar_ktvc_2016_EDA.log
## PDF Name: metar_ktvc_2016_EDA.pdf
## Will use Traverse City, MI (2016) as the description for metar_ktvc_2016.rds
##
##
## EDA PDF file is available at: ./RInputFiles/ProcessedMETAR/metar_ktvc_2016_EDA.pdf
##
## EDA log file is available at: ./RInputFiles/ProcessedMETAR/metar_ktvc_2016_EDA.log
The files are now available for further analysis, with individual PDF files for plots associated with each locale.
With EDA about each locale saved to .pdf and .log files, it is interesting to investigate comparisons among the various locales. For example, how do the temperature patterns by month for a given locale compare to the overall global median temperature patterns by month?
The existing functions contain most of the code needed to perform this. The main steps to add are:
The first step is to combine one or more processed files, with a column added for locale:
# Combine files from a character list
combineProcessedFiles <- function(charList, mapper=cityNameMapper) {
# Combine the objects represented by charList, and name the list items using charList
listFiles <- lapply(charList, FUN=function(x) { get(x) })
names(listFiles) <- charList
# Bind rows, and add the descriptive locale name as sourceName
tblFiles <- bind_rows(listFiles, .id="source") %>%
mutate(sourceName=mapper[source])
tblFiles
}
The 2016 data will be used to run the combined process:
# Grab all the data that ends in _2016
locales2016 <- ls() %>%
grep(pattern="_2016", value=TRUE)
cat("\nLocales used will be:\n\n", paste0(locales2016, collapse="\n"), "\n\n", sep="")
##
## Locales used will be:
##
## kdtw_2016
## kewr_2016
## kgrb_2016
## kgrr_2016
## kiah_2016
## kind_2016
## klas_2016
## klnk_2016
## kmke_2016
## kmsn_2016
## kmsp_2016
## kmsy_2016
## kord_2016
## ksan_2016
## ktvc_2016
# Combine the 2016 data
all2016Data <- combineProcessedFiles(locales2016)
# Show counts by sourceName
all2016Data %>%
count(source, sourceName)
## # A tibble: 15 x 3
## source sourceName n
## <chr> <chr> <int>
## 1 kdtw_2016 Detroit, MI (2016) 8818
## 2 kewr_2016 Newark, NJ (2016) 8821
## 3 kgrb_2016 Green Bay, WI (2016) 8803
## 4 kgrr_2016 Grand Rapids, MI (2016) 8812
## 5 kiah_2016 Houston, TX (2016) 8816
## 6 kind_2016 Indianapolis, IN (2016) 8767
## 7 klas_2016 Las Vegas, NV (2016) 8818
## 8 klnk_2016 Lincoln, NE (2016) 8813
## 9 kmke_2016 Milwaukee, WI (2016) 8808
## 10 kmsn_2016 Madison, WI (2016) 8798
## 11 kmsp_2016 Minneapolis, MN (2016) 8817
## 12 kmsy_2016 New Orleans, LA (2016) 8813
## 13 kord_2016 Chicago, IL (2016) 8815
## 14 ksan_2016 San Diego, CA (2016) 8810
## 15 ktvc_2016 Traverse City, MI (2016) 8814
The following global summaries will be useful:
Helper functions can be created for:
The function helperFactorNumeric is created to apply function f to numeric variable y by factor variable x:
# Helper function to get overall percentage by metric for variable x
helperCountsByMetric <- function(tbl, ctVar, sumOn="dummyVar") {
tbl %>%
mutate(dummyVar=1) %>%
select_at(vars(all_of(c(ctVar, sumOn)))) %>%
filter_all(all_vars(!is.na(.))) %>%
group_by_at(ctVar) %>%
summarize(n=sum(get(sumOn))) %>%
mutate(nPct=n/sum(n))
}
# Example run to get counts by greatest sky obscuration
helperCountsByMetric(all2016Data, ctVar="wType")
## # A tibble: 7 x 3
## wType n nPct
## <fct> <dbl> <dbl>
## 1 VV 761 0.00576
## 2 OVC 39480 0.299
## 3 BKN 31352 0.237
## 4 SCT 16212 0.123
## 5 FEW 19304 0.146
## 6 CLR 24433 0.185
## 7 Error 601 0.00455
# Helper function to get a geom_smooth for variable y vs variable x
helperNumCor <- function(tbl,
xVar,
yVar,
sumOn="dummyVar",
se=TRUE,
color="red",
method="lm",
lty=2
) {
# Generate the overall totals for sumOn by xVar and yVar
plotData <- tbl %>%
mutate(dummyVar=1) %>%
select_at(vars(all_of(c(xVar, yVar, sumOn)))) %>%
filter_all(all_vars(!is.na(.))) %>%
group_by_at(vars(all_of(c(xVar, yVar)))) %>%
summarize(nTotal=sum(get(sumOn)))
geom_smooth(data=plotData,
aes_string(x=xVar, y=yVar, weight="nTotal"),
se=se,
color=color,
method=method,
lty=lty
)
}
# Example run to get TempC vs DewC
helperNumCor(all2016Data, xVar="TempC", yVar="DewC")
## mapping: weight = ~nTotal, x = ~TempC, y = ~DewC
## geom_smooth: na.rm = FALSE, se = TRUE
## stat_smooth: na.rm = FALSE, se = TRUE, method = lm, formula = y ~ x
## position_identity
# Example for using the helper function on a plot
plotNumCor(kdtw_2016, var1="TempC", var2="DewC") +
helperNumCor(all2016Data, xVar="TempC", yVar="DewC")
# Helper function to calculate .f(numVar) by byVar
helperFactorNumeric <- function(tbl, .f, byVar, numVar, ...) {
tbl %>%
select_at(vars(all_of(c(byVar, numVar)))) %>%
filter_all(all_vars(!is.na(.))) %>%
group_by_at(byVar) %>%
summarize(helpFN=.f(get(numVar), ...))
}
# Example for getting median TempF by month
helperFactorNumeric(all2016Data, .f=median, byVar="month", numVar="TempF")
## # A tibble: 12 x 2
## month helpFN
## <fct> <dbl>
## 1 Jan 30.9
## 2 Feb 35.1
## 3 Mar 46.9
## 4 Apr 52.0
## 5 May 64.0
## 6 Jun 73.0
## 7 Jul 77
## 8 Aug 75.9
## 9 Sep 71.1
## 10 Oct 61.0
## 11 Nov 50
## 12 Dec 34.0
The function plotCountsByMetric() has been updated above to allow for facetting and plotting of the overall central tendency. Two examples are shown - the base from previous, and a facetted example:
# Previous Example for Detroit 2016 - using WindDir, cType1, month, wType
plotcountsByMetric(kdtw_2016,
mets=c("WindDir", "cType1", "month", "wType"),
title="Detroit, MI (2016)",
dropNA=TRUE,
diagnose=TRUE
)
##
## Dropping 1 rows with 49 observations due to NA
# Facetted example for kdtw_2016, kord_2016, klas_2016, ksan_2016
useData <- all2016Data %>%
filter(source %in% c("kdtw_2016", "klas_2016", "kord_2016", "ksan_2016"))
plotcountsByMetric(useData,
mets=c("WindDir", "cType1", "month", "wType"),
title="Comparison Across Locales (red dots are the median)",
dropNA=TRUE,
diagnose=TRUE,
facetOn="sourceName",
showCentral=TRUE
)
##
## Dropping 4 rows with 117 observations due to NA
As observed previously, Las Vegas tends towards southerly winds while San Diego tends towards northwesterly winds and calm (direction 000) winds. Detroit is most likely to be overcast, while Las Vegas is most likely to be clear.
The function plotNumCor() has been updated above to allow for facetting and plotting of the overall central tendency. Two examples are shown - the base from previous, and a facetted example:
# Example for Detroit 2016 - using TempC and DewC
plotNumCor(kdtw_2016, var1="TempC", var2="DewC", subT="Detroit, MI (2016)", diagnose=TRUE)
##
## Dropping 1 rows with 49 observations due to NA
# Facetted example for kdtw_2016, kord_2016, klas_2016, ksan_2016
useData <- all2016Data %>%
filter(source %in% c("kdtw_2016", "klas_2016", "kord_2016", "ksan_2016"))
# Facetted plot for very highly correlated variables TempC and TempF
plotNumCor(useData,
var1="TempC",
var2="TempF",
subT="Comparison Across Locales (red dashed lines are the overall)",
dropNA=TRUE,
diagnose=TRUE,
facetOn="sourceName",
showCentral=TRUE
)
##
## Dropping 4 rows with 117 observations due to NA
# Facetted plot for highly correlated variables TempF and DewF
plotNumCor(useData,
var1="TempF",
var2="DewF",
subT="Comparison Across Locales (red dashed lines are the overall)",
dropNA=TRUE,
diagnose=TRUE,
facetOn="sourceName",
showCentral=TRUE
)
##
## Dropping 4 rows with 117 observations due to NA
The differences in the relationships between temperature and dew point stand out:
In contrast, of course, temperatures measured in C and F all follow the same pattern regardless of city
The function plotFactorNumeric() has been updated above to allow for facetting and plotting of the overall central tendency. Two examples are shown - the base from previous, and a facetted example:
# Example for Detroit 2016 - using TempF and month
plotFactorNumeric(kdtw_2016,
fctVar="month",
numVar="TempF",
subT="Detroit, MI (2016)",
showXLabel=FALSE,
diagnose=TRUE
)
##
## Removing 49 records due to NA
# Facetted example for kdtw_2016, kord_2016, klas_2016, ksan_2016
useData <- all2016Data %>%
filter(source %in% c("kdtw_2016", "klas_2016", "kord_2016", "ksan_2016"))
plotFactorNumeric(useData,
fctVar="month",
numVar="TempF",
subT="Overall median shown as red dot",
showXLabel=FALSE,
diagnose=TRUE,
facetOn="sourceName",
showCentral=TRUE
)
##
## Removing 117 records due to NA
Las Vegas consistently runs above the overall median temperature, while Chicago and Detroit run below the overall median temperature, particularly during the cold season. San Diego has little temperature variation by month and is thus below the median in the warm season and above the median in the cold season.
It is now possible to re-run the EDA plotting routines, focused on the 2016 data:
# Create rounded TempF and DewF in all2016Data
all2016Data <- all2016Data %>%
mutate(TempF5=5*round(round(TempF)/5),
DewF5=5*round(round(DewF)/5),
WindSpeed5=5*round(WindSpeed/5),
Altimeter10=round(Altimeter, 1)
)
# Counts by Metric for all 2016 data
plotcountsByMetric(all2016Data,
mets=c("month", "year",
"WindDir", "WindSpeed5",
"Visibility", "Altimeter10",
"TempF5", "DewF5",
"wType"
),
title="Comparisons Across Locales (red dots are the median)",
facetOn="sourceName",
showCentral=TRUE
)
The cross-locale comparisons bring out a few salient features:
DATA VOLUMES:
* All locales have roughly the same amount of data by year and month, focused on 2016 and with 1-2 days on either side
WIND DIRECTION and WIND SPEED:
* Las Vegas has an excess of no/variable wind and of southerly winds, both appropriate for a desert
* Houston has an excess of no wind and southerly winds, both appropriate for the Gulf Coast
* San Diego has an excess of no wind and of northwesterly wind, both appropriate for the Pacific coast
* Chicago, Grand Rapids, Indianapolis, Minneapolis, and Newark all have lower occurences of no wind, appropriate for relatively cold mid-latitude cities
* Lincoln looks “about normal”, with the exception that it has slightly more southerly winds; Lincoln is the only Great Plains locale in the analysis, and it appears to be a blend of Gulf Coast and Wintry
* Detroit, Green Bay, Milwaukee, and New Orleans all look “about average”; this is not surprising in the first three cases given the predominance of cold, mid-latitude locales, but is surprising for New Orleans
* Madison and Traverse City are surprising in that both show a predominance of no/variable winds; this is uncommon for cold-weather cities in the mid-latitude and merits further examination
VISIBILITY:
* The overwhelming majority of visibilities are 10SM (the highest that is recorded; more or less means unlimited in the METAR)
* There is a data issue with a Visibility > 10 that should be addressed
* Las Vegas is slightly more likely than most to have unlimited Visibility and Detroit is slightly less likely than most to have unlimited visibility
ALTIMETER:
* Las Vegas skews low as appropriate for a high-altitude desert locale
* New Orleans and Houston show less variance, perhaps driven by being roughly at sea level and in close proximity to the Gulf of Mexico
* San Diego shows very low variance, perhaps driven by being roughly at sea level and in close proximity to the Pacific Ocean
TEMPERATURE:
* Houston, Las Vegas, and New Orleans skew warm as expected
* San Diego has ver low variance as expected
* At a gross level, the other cities look similar to the median, likely driven by the predominance of cold, mid-latitude locales in the data file
DEW POINT:
* Houston and New Orleans skew very high as expected
* Las Vegas skews very low as expected
* San Diego has very low variance as expected
SKY OBSCURATION:
* Lincoln and Green Bay are the most likely to be CLR (clear, no clouds on the automated sensor). This may be driven by a difference in maximum sensor heights, and is unexpected in Green Bay which should be frequently cloudy due to its latitude and proximity to a large body of water
* Detroit, Traverse City, Grand Rapids, and Minneapolis are especially likely to be overcast
* Las Vegas is especially likely to have few clouds or to be clear
Comparisons are run for a few of the numerical correlations:
# Example for 2016 - using mixes of WindSpeed, Altimeter, TempF, DewF, TempC, DewC
numCorList <- list(c("TempC", "TempF"),
c("DewC", "DewF"),
c("TempF", "DewF"),
c("Altimeter", "WindSpeed"),
c("Altimeter", "TempF")
)
# Run the list through plotNumCor()
for (x in numCorList) {
plotNumCor(all2016Data,
var1=x[1],
var2=x[2],
subT="Red dashed line is the overall slope",
diagnose=TRUE,
facetOn="sourceName",
showCentral=TRUE
)
}
##
## Dropping 15 rows with 593 observations due to NA
##
## Dropping 15 rows with 593 observations due to NA
##
## Dropping 15 rows with 593 observations due to NA
##
## Dropping 15 rows with 593 observations due to NA
##
## Dropping 15 rows with 593 observations due to NA
The cross-locale comparisons bring out a few salient features:
FAHRENHEIT AND CELSIUS:
* As expected, TempF/C are perfectly correlated and DewF/C are perfectly correlated. Since the observations were taken in the US, the TempF/DewF data will be used (TempC/DewC are conversions from the measured TempF/DewF to match the international standard for METAR reporting)
TEMPERATURE AND DEW POINT:
* While many cities have different clusters of temperature/dew point, all but Las Vegas and San Diego follow a pattern where the temperature and the dew point tend to run together at a similar rate
* In Las Vegas, the dew point is largely independent of the temperature
* In San Diego, there is less variance in temperature and dew point, and a lower (but still obvious) tendency for tenmperature and dew point to rise/fall together
ALTIMETER AND WIND SPEED:
* As expected, when the altimeter rises, on average, the wind speed falls
* This tendency is less pronounced in Houston and New Orleans; and more pronounced in Las Vegas, Dan Diego, and Indianapolis
ALTIMETER AND TEMPERATURE:
* Overall, low temperatures and high altimeters tend to be observed together
* This is especially so in Houston, Las Vegas, and New Orleans
* This is more modest in Grand Rapids, Green Bay, and Traverse City
Comparisons are then run for numeric variables against factor variables:
# Modify windDir so that it is just N, NE, E, SE, S, SW, W, NW, 000, Variable
mod2016Data <- all2016Data %>%
mutate(tempDir=ifelse(is.na(WindDir) | WindDir %in% c("000", "VRB"), -1, as.numeric(WindDir)),
predomDir=factor(case_when(is.na(WindDir) ~ "Error",
WindDir=="000" ~ "000",
WindDir=="VRB" ~ "VRB",
tempDir >= 337.5 ~ "N",
tempDir <= 22.5 ~ "N",
tempDir <= 67.5 ~ "NE",
tempDir <= 112.5 ~ "E",
tempDir <= 157.5 ~ "SE",
tempDir <= 202.5 ~ "S",
tempDir <= 247.5 ~ "SW",
tempDir <= 292.5 ~ "W",
tempDir <= 337.5 ~ "NW",
TRUE ~ "Error"
),
levels=c("Error", "000", "VRB", "NE", "E", "SE", "S", "SW", "W", "NW", "N")
)
)
## Warning in ifelse(is.na(WindDir) | WindDir %in% c("000", "VRB"), -1,
## as.numeric(WindDir)): NAs introduced by coercion
# Key factor variables include month, wType, predomDir
# Key numeric variables include WindSpeed, Altimeter, TempF, DewF, Visibility
fctNumList <- list(c("month", "WindSpeed"),
c("month", "Altimeter"),
c("month", "TempF"),
c("month", "DewF"),
c("month", "Visibility"),
c("wType", "WindSpeed"),
c("wType", "Altimeter"),
c("wType", "Visibility"),
c("predomDir", "WindSpeed"),
c("predomDir", "Altimeter"),
c("predomDir", "TempF"),
c("predomDir", "DewF")
)
for (x in fctNumList) {
plotFactorNumeric(mod2016Data,
fctVar=x[1],
numVar=x[2],
subT="Red dots are the overall average",
showXLabel=FALSE,
diagnose=TRUE,
facetOn="sourceName",
showCentral=TRUE
)
}
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
The cross-locale comparisons bring out a few salient points:
WIND SPEED BY MONTH:
* The plot is too busy; need to rethink, potentially by removing the outliers and plotting only the box
ALTIMETER BY MONTH:
* Same as above; this plot is too busy
TEMPERATURE BY MONTH:
* Seasonal patterns are observed in most of the data, with a warm season centered around July and a cold season centered around December
* Las Vegas stand out for running warmer than the overall average in every month
* Houston, New Orleans, and San Diego stand out for running warmer than the overall average in the cold season and similar to the overall average in the warm season
DEW POINT BY MONTH:
* Seasonal patterns are observed in most of the data, with the humid seasons tracking with the warm seasons
* Houston and New Orleans run consistently above the average dew point by month
* San Diego runs above the average dew point during the cold season
* Las Vegas runs below the average dew point during the warm season
VISIBILITY BY MONTH:
* Visibilities are overwhelmingly likely to be 10 SM
* The Newark, NJ outlier described earlier (19 SM) should be deleted
* Detroit is especially likely to have Visibility less than 10 SM
* Grand Rapids and Traverse City have meaningul occurences of Visibility less than 10 SM during the cold season
WIND SPEED BY SKY OBSCURATION:
* This chart is too busy as per above
ALTIMETER BY SKY OBSCURATION:
* Not much at a glance
VISIBILITY BY SKY OBSCURATION:
* The VV and OVC sky obscurations are most associated with low visibilities; VV in particular is almost always associated with very low visibility
WIND SPEED BY WIND DIRECTION:
* Wind direction “000” is always associated with wind speed 0, as expected
* Wind direction “VRB” is always associated with a low but non-zero wind speed, as expected
* While some cities are windier than others, there is no pronounced tendency for wind speed to be highly associated with a given wind direction in any locale
ALTIMETER BY WIND DIRECTION:
* No gross trends observed
* The plot is rather busy, especially given the low variance in median/IQR for altimeter relative to the outlier points
TEMPERATURE/DEW POINT BY WIND DIRECTION:
* Plot is not great for reading and interpreting
Next steps are to modify a few of the plots for better interpretability (less busy, more variation of the core data metric relative to the full y-axis, etc.), investigate and rectify the data issues observed, and save a version of the file for further analysis.
A modified boxplot function is created to plot only the median and the IQR, with the goal of having more of the variance in the median visible on the plot:
plotMedianIQR <- function(met,
fctVar,
numVar,
title=NULL,
subT="",
mid=0.5,
rng=c(0.25, 0.75),
diagnose=TRUE,
showXLabel=TRUE,
mapper=varMapper,
facetOn=NULL,
showCentral=FALSE,
ylimits=NULL
) {
# Function arguments
# met: dataframe or tibble containing raw data
# fctVar: character vector of variable to be used for the x-axis (factor in the boxplot)
# numVar: character vector of variable to be used for the y-axis (numeric in the boxplot)
# title: character vector for plot title
# subT: character vector for plot subtitle
# mid: float between 0 and 1 for the quantile to be used as the midpoint
# rng: length-two float vector for (lo, hi) to be used as the dimensions of the box
# diagnose: boolean for whether to note in the log the number of NA observations dropped
# showXLabel: boolean for whether to include the x-label (e.g., set to FALSE if using 'month')
# mapper: named list containing mapping from variable name to well-formatted name for titles and axes
# facetOn: a facetting variable for the supplied met (NULL for no faceting)
# showCentral: boolean for whether to show the central tendency over-plotted on the main data
# ylimits: length-two numeric for the y-axis minimum and maximum (default NULL uses plot defaults)
# Function usage
# 1. By default, the function creates a modified boxplot of numVar by fctVar - line at mid, box going from rng[1] to rng[2]
# 2. If facetOn is passed as a non-NULL, then the data in #1 will be facetted by facetOn
# 3. If showCentral=TRUE, then the overall median of numVar by fctVar will be plotted as a red dot
# Check that the quantile variables are sensible
if (length(mid) != 1 | length(rng) != 2) {
stop("Must pass a single value as mid and a length-two vector as rng\n")
}
if (min(c(mid, rng)) < 0 | max(c(mid, rng)) > 1) {
stop("All values of mid and rng must be between 0 and 1, inclusive\n")
}
if ((mid < rng[1]) | (mid > rng[2])) {
stop("mid must be at least as big as rng[1] and no greater than rng[2]\n")
}
quants <- paste0(round(100*c(rng[1], mid, rng[2]), 0), "%", collapse=" ")
# Create the title if not passed
if (is.null(title)) {
title <- paste0("Hourly Observations of ", mapper[numVar], " by ", mapper[fctVar])
}
# Remove the NA variables
nOrig <- nrow(met)
dat <- met %>%
filter(!is.na(get(fctVar)), !is.na(get(numVar)))
if (diagnose) { cat("\nRemoving", nOrig-nrow(dat), "records due to NA\n") }
# Create the quantile data by fctVar and (if passed) facetOn
groupVars <- fctVar
if (!is.null(facetOn)) { groupVars <- c(groupVars, facetOn) }
plotData <- dat %>%
group_by_at(groupVars) %>%
summarize(midPoint=quantile(get(numVar), probs=mid),
loPoint=quantile(get(numVar), probs=rng[1]),
hiPoint=quantile(get(numVar), probs=rng[2])
)
# Create the base plot
p <- plotData %>%
ggplot(aes_string(x=fctVar, y="midPoint")) +
geom_crossbar(aes(ymin=loPoint, ymax=hiPoint), fill="lightblue") +
labs(title=title,
subtitle=subT,
x=ifelse(showXLabel, paste0(mapper[fctVar], " - ", fctVar), ""),
y=paste0(mapper[numVar], " - ", numVar),
caption=paste0("Quantiles plotted: ", quants)
)
# If facetting has been requested, facet by the desired variable
if (!is.null(facetOn)) {
p <- p + facet_wrap(as.formula(paste("~", facetOn)))
}
# If showCentral=TRUE, add a dot plot for the overall value of 'mid'
if (showCentral) {
centData <- helperFactorNumeric(dat, .f=quantile, byVar=fctVar, numVar=numVar, probs=mid)
p <- p + geom_point(data=centData, aes(y=helpFN), size=2, color="red")
}
# If ylim has been passed, use it
if (!is.null(ylimits)) {
p <- p + ylim(ylimits)
}
# Render the final plot
print(p)
}
The function can then be applied in an attempt to get a better look at a few of the comparisons:
# Key factor variables include month, wType, predomDir
# Key numeric variables include WindSpeed, Altimeter, TempF, DewF, Visibility
fctNumListIQR <- list(c("month", "WindSpeed"),
c("month", "Altimeter"),
c("wType", "WindSpeed"),
c("predomDir", "Altimeter"),
c("predomDir", "TempF"),
c("predomDir", "DewF")
)
for (x in fctNumListIQR) {
plotMedianIQR(mod2016Data,
fctVar=x[1],
numVar=x[2],
subT="Red dots are the overall mid-quantile",
showXLabel=FALSE,
diagnose=TRUE,
facetOn="sourceName",
showCentral=TRUE
)
}
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
##
## Removing 593 records due to NA
Limiting the observations to Q1-Median-Q2 brings out a bit more information in the plots:
WIND SPEED BY MONTH:
* San Diego and Las Vegas are meaningfully less windy than the median, especially during the cold season
ALTIMETER BY MONTH:
* San Diego has very little variance in altimeter relative to the other locales
* Las Vegas runs especially low on altimeter during the warm season
* There is a seasonal pattern where altimeters tend to run lower during the warm season while also showing a much smaller Q1-Q3 range
WIND SPEED BY SKY OBSCURATION:
* Newark shows very high wind speeds when obscuration is VV
* Should re-run with data y-axis capped at 15 to visualize the other locales
ALTIMETER BY WIND DIRECTION:
* Altimeters tend to be a bit lower when winds are from the S-SW-W-NW, a pattern that seems consistent across all locales
* Las Vegas runs especially low on altimeter relative to other locales when winds are from E-SE-S
TEMPERATURE BY WIND DIRECTION:
* Variable winds are associated with higher temperatures than calm (speed 0) winds
* Winds from W-NW are generally associated with somewhat lower temperatures
DEW POINT BY WIND DIRECTION:
* Winds from SW-W-NW-N are associated with generally lower dew points than winds from NE-E-SE-S
* Las Vegas typically runs low on dew points while Houston and New Orleans typically run high on dew points
* San Diego has little variance in dew points and does not see any dip when winds are from SW-W-NW-N
The wind speed vs. sky obscuration plot is re-run with y-limits that suppress Newark VV:
plotMedianIQR(mod2016Data,
fctVar="wType",
numVar="WindSpeed",
subT="Red dots are the overall mid-quantile",
showXLabel=FALSE,
diagnose=TRUE,
facetOn="sourceName",
showCentral=TRUE,
ylimits=c(0, 15)
)
##
## Removing 593 records due to NA
## Warning: Removed 1 rows containing missing values (geom_crossbar).
Low vertical visibilities (VV) are generally associated with lower winds. There is also a slight tendency for more obscured skies (OVC) to be associated with slightly stronger winds.
Further, the previous analysis for counts by wind direction is re-run using the predominant directions:
# Counts by Metric for predomDir using mod2016Data
plotcountsByMetric(mod2016Data,
mets=c("predomDir"),
title="Comparisons Across Locales (red dots are the median)",
facetOn="sourceName",
showCentral=TRUE
)
Findings include:
* Across all locales, winds are more commonly from S-SW-W-NW-N than from NE-E-SE
* San Diego is especially likely to have either 0 wind or wind from W-NW
* Traverse City is especially likely to have 0 wind or variable wind (very surprising, and possibly a sign of anomalous data)
* Minneapolis is especially likely to experience winds from NE
* New Orleans and Houston rarely experience winds from SW-W-NW relative to other locales
* Lincoln and Las Vegas are especially pronse to winds from S (and for Las Vegas SW)
There are several issues identified that should be explored and fixed if appropriate:
Cloud exploration can highlight whether all locales are being compared apples to apples. This is especially the case if some locales have a different maximum sensor height:
# Select the source, sourceName, dtime and cLevel variables; pivot cLevel down
cLevels <- mod2016Data %>%
select(source, sourceName, dtime, starts_with("cLevel")) %>%
pivot_longer(-c(source, sourceName, dtime), names_to="level", values_to="height") %>%
mutate(level=as.integer(str_replace(level, pattern="cLevel", replacement="")))
# Select the source, sourceName, dtime and cType variables; pivot cLevel down
cTypes <- mod2016Data %>%
select(source, sourceName, dtime, starts_with("cType")) %>%
pivot_longer(-c(source, sourceName, dtime), names_to="level", values_to="type") %>%
mutate(level=as.integer(str_replace(level, pattern="cType", replacement="")))
cData <- cLevels %>%
inner_join(cTypes, by=c("source", "sourceName", "dtime", "level"))
# Plot cloud heights, using only non-NA
cData %>%
filter(!is.na(height)) %>%
ggplot(aes(x=fct_reorder(sourceName, height, .fun=max, na.rm=TRUE), y=height)) +
geom_violin(fill="lightblue") +
coord_flip() +
labs(x="", y="Cloud Height (feet)", title="Density of cloud heights by locale")
There are clearly differences in maximum cloud height recorded by locale:
The distribution of cloud types observed can also be assessed:
# Plot cloud heights, using only non-""
fctLayers <- c("VV", "OVC", "BKN", "SCT", "FEW")
cData %>%
filter(type!="") %>%
mutate(type=factor(type, levels=fctLayers)) %>%
ggplot(aes(x=fct_reorder(sourceName, height, .fun=max, na.rm=TRUE), fill=type)) +
geom_bar(position="stack") +
coord_flip() +
labs(x="", y="Cloud Layer Obscuration", title="Cloud obscuration by locale") +
scale_fill_discrete("", rev(fctLayers)) +
theme(legend.position="bottom")
Lincoln and Green Bay stand out for having fewer clouds, likely due to the inability to catch the higher altocumulus and any cirrus cloud (both very common in the mid-latitudes).
Supposing that only clouds of 12,000 feet and under are considered:
# Plot cloud heights, using only non-NA
cData %>%
filter(!is.na(height)) %>%
filter(height <= 12000) %>%
ggplot(aes(x=sourceName, y=height)) +
geom_violin(fill="lightblue") +
coord_flip() +
labs(x="", y="Cloud Height (feet)", title="Density of cloud heights by locale")
cData %>%
filter(type!="") %>%
filter(height <= 12000) %>%
mutate(type=factor(type, levels=fctLayers)) %>%
ggplot(aes(x=fct_reorder(sourceName, sourceName, .fun=length), fill=type)) +
geom_bar(position="stack") +
coord_flip() +
labs(x="", y="Cloud Layer Obscuration", title="Cloud obscuration by locale (up to 12,000 feet)") +
scale_fill_discrete("", rev(fctLayers)) +
theme(legend.position="bottom")
The patterns are much more plausible:
Further investigation of the cloud data may be interesting.
A function is written to take only cloud data up through height x, and to add a layer of clouds that are “clear” at a height that is out-of-interval:
# Filter to only clouds up to and including height
cloudsLevel0 <- function(df,
maxHeight,
byVars,
baseLevel=0,
heightBase=-100,
typeBase="CLR"
) {
# Function assumptions
# Input data are unique by byVars-level and with columns 'height' and 'type'
# Clouds increase in height with level
# Clouds are non-decreasing in type (VV > OVC > BKN > SCT > FEW) with level
# FUNCTION ARGUMENTS:
# df: tibble or dataframe contiaining the clouds data
# maxHeight: the maximum height to consider (delete all heights above this level)
# byVars: the variables that make up a unique observation (df should be unique by byVars-level)
# baseLevel: the level to be created as the base level (by default, level 0)
# heightBase: the height to be provided to the base level (by default, -100 feet)
# typeBase: the type of obscuration observed at the base level (by default, CLR)
# Add a cloud level 0 that has height -100 (by default)
# Include only levels where the cloud height is not NA
# Include only levels where the cloud height is less than or equal to maxHeight
modData <- df %>%
group_by_at(vars(all_of(byVars))) %>%
summarize(level=baseLevel, height=heightBase, type=typeBase) %>%
ungroup() %>%
bind_rows(df) %>%
arrange_at(vars(all_of(c(byVars, "level")))) %>%
filter(!is.na(height)) %>%
filter(height <= maxHeight)
modData
}
The function is then run using the existing cData:
modCData <- cloudsLevel0(cData, maxHeight=12000, byVars=c("source", "sourceName", "dtime"))
modCData <- modCData %>%
mutate(type=factor(type, levels=c("VV", "OVC", "BKN", "SCT", "FEW", "CLR")))
modCData
## # A tibble: 260,324 x 6
## source sourceName dtime level height type
## <chr> <chr> <dttm> <dbl> <dbl> <fct>
## 1 kdtw_2016 Detroit, MI (2016) 2015-12-31 00:53:00 0 -100 CLR
## 2 kdtw_2016 Detroit, MI (2016) 2015-12-31 00:53:00 1 2500 BKN
## 3 kdtw_2016 Detroit, MI (2016) 2015-12-31 00:53:00 2 5000 OVC
## 4 kdtw_2016 Detroit, MI (2016) 2015-12-31 01:53:00 0 -100 CLR
## 5 kdtw_2016 Detroit, MI (2016) 2015-12-31 01:53:00 1 2100 OVC
## 6 kdtw_2016 Detroit, MI (2016) 2015-12-31 02:53:00 0 -100 CLR
## 7 kdtw_2016 Detroit, MI (2016) 2015-12-31 02:53:00 1 2100 OVC
## 8 kdtw_2016 Detroit, MI (2016) 2015-12-31 03:53:00 0 -100 CLR
## 9 kdtw_2016 Detroit, MI (2016) 2015-12-31 03:53:00 1 2500 OVC
## 10 kdtw_2016 Detroit, MI (2016) 2015-12-31 04:53:00 0 -100 CLR
## # ... with 260,314 more rows
An additional function is written to take processed cloud data and to designate 1) the minimum cloud height, 2) the minimum cloud ceiling (a ceiling exists with BKN, OVC, or VV layers), and 3) the maximum obscuration level (using CLR if no clouds exist in that area):
# Helper function to pull out the minimum cloud height, limited to certain obscurations
getMinimumHeight <- function(df,
byVars,
types,
baseLevel=0
) {
# Split the data in to the baseLevel and all other levels
baseData <- df %>%
filter(level==baseLevel)
layerData <- df %>%
filter(level!=baseLevel)
# Take the layerData, limit to type in types, and find the minimum level
layerData <- layerData %>%
filter(type %in% types) %>%
group_by_at(vars(all_of(byVars))) %>%
filter(level==min(level)) %>%
ungroup()
# Put the data back together
# Keep the maximum level for each set of byVars (will be 0 if no data for byVars in layerData)
cloudData <- baseData %>%
bind_rows(layerData) %>%
arrange_at(vars(all_of(c(byVars, "level")))) %>%
group_by_at(vars(all_of(byVars))) %>%
filter(level==max(level))
}
# Extract the minimum cloud height, minimum ceiling height, and maximum obscuration
hgtCeilObsc <- function(df,
byVars,
baseLevel=0
) {
# Function assumptions
# Input data are unique by byVars-level and with columns 'height' and 'type'
# For each byVars, a row with level=baseLevel, height=heightBase, type=typeBase has been created
# Clouds increase in height with level
# Clouds are non-decreasing in type (VV > OVC > BKN > SCT > FEW) with level
# FUNCTION ARGUMENTS:
# df: tibble or dataframe contiaining the clouds data
# byVars: the variables that make up a unique observation (df should be unique by byVars-level)
# baseLevel: the base level in df (by default, level 0)
# heightBase: the height of the base level in df (by default, -100 feet)
# typeBase: the type of obscuration at the base level in df (by default, CLR)
# Get the maximum obscuration
maxObsc <- df %>%
group_by_at(vars(all_of(byVars))) %>%
filter(level==max(level)) %>%
ungroup()
# Get the minimum height (any type)
minHeight <- getMinimumHeight(df,
byVars=byVars,
types=c("VV", "OVC", "BKN", "SCT", "FEW"),
baseLevel=baseLevel
)
# Get the minimum ceiling height (VV, OVC, BKN)
minCeiling <- getMinimumHeight(df,
byVars=byVars,
types=c("VV", "OVC", "BKN"),
baseLevel=baseLevel
)
# Put the file together
minCeiling <- minCeiling %>%
rename(ceilingHeight=height, ceilingType=type, ceilingLevel=level)
minHeight <- minHeight %>%
rename(cloudHeight=height, cloudType=type, cloudLevel=level)
maxObsc <- maxObsc %>%
rename(obscHeight=height, obscType=type, obscLevel=level)
# Merge
cloudSummary <- maxObsc %>%
full_join(minHeight, by=byVars) %>%
full_join(minCeiling, by=byVars)
cloudSummary
}
The function can then be run on the modified clouds data:
# Get the key clouds data
cloudSummary <- hgtCeilObsc(modCData, byVars=c("source", "sourceName", "dtime"))
# Check for consistency
cloudSummary %>%
count(ceilingType, obscType)
## # A tibble: 7 x 3
## ceilingType obscType n
## <fct> <fct> <int>
## 1 VV VV 761
## 2 OVC OVC 22654
## 3 BKN OVC 10054
## 4 BKN BKN 17198
## 5 CLR SCT 12456
## 6 CLR FEW 22711
## 7 CLR CLR 46309
cloudSummary %>%
count(cloudType, ceilingType)
## # A tibble: 10 x 3
## cloudType ceilingType n
## <fct> <fct> <int>
## 1 VV VV 761
## 2 OVC OVC 17900
## 3 BKN BKN 15436
## 4 SCT OVC 1848
## 5 SCT BKN 4936
## 6 SCT CLR 8243
## 7 FEW OVC 2906
## 8 FEW BKN 6880
## 9 FEW CLR 26924
## 10 CLR CLR 46309
cloudSummary %>%
count(cloudType, obscType)
## # A tibble: 12 x 3
## cloudType obscType n
## <fct> <fct> <int>
## 1 VV VV 761
## 2 OVC OVC 17900
## 3 BKN OVC 6384
## 4 BKN BKN 9052
## 5 SCT OVC 3444
## 6 SCT BKN 3340
## 7 SCT SCT 8243
## 8 FEW OVC 4980
## 9 FEW BKN 4806
## 10 FEW SCT 4213
## 11 FEW FEW 22711
## 12 CLR CLR 46309
Plots for the maximum obscuration (through 12,000 feet) can then be created:
plotMaxObsc <- function(df,
xVar,
fillVar,
title,
subtitle="Up to and including 12,000 feet",
orderByVariable=NULL,
orderByValue=NULL,
posnBar="stack",
yLabel="# Hourly Observations",
legendLabel="",
facetOn=NULL
) {
# Get the levels to be used
cLevels <- levels(df %>% pull(fillVar))
# Create the main plot
p1 <- df %>%
ggplot(aes_string(fill=fillVar))
if (!is.null(orderByVariable)) {
p1 <- p1 +
geom_bar(aes(x=fct_reorder(get(xVar), get(orderByVariable)==orderByValue, .fun=sum)),
position=posnBar
)
} else {
p1 <- p1 +
geom_bar(aes_string(x=xVar), position=posnBar)
}
p1 <- p1 +
coord_flip() +
labs(x="",
y=yLabel,
title=title,
subtitle=subtitle
) +
theme(legend.position="bottom") +
scale_fill_discrete(legendLabel, rev(cLevels)) +
guides(fill=guide_legend(nrow=1))
if (!is.null(facetOn)) {
p1 <- p1 + facet_wrap(as.formula(paste("~", facetOn)))
}
print(p1)
}
# Cloud obscuration by source
plotMaxObsc(cloudSummary,
xVar="sourceName",
fillVar="obscType",
title="Maximum Cloud Obscuration",
orderByVariable="obscType",
orderByValue="CLR"
)
# Cloud obscuration by month
cloudSummary <- cloudSummary %>%
mutate(month=lubridate::month(dtime),
hour=lubridate::hour(dtime),
monthfct=factor(month.abb[month], levels=month.abb[1:12])
)
plotMaxObsc(cloudSummary,
xVar="monthfct",
fillVar="obscType",
title="Maximum Cloud Obscuration",
posnBar="fill"
)
A few salient observations stand out about the maximum obscuration level:
Groups of cities can be examined, faceted by month:
cityCloudList <- list(c("klas_2016", "ksan_2016", "kiah_2016", "kmsy_2016"),
c("kgrb_2016", "kgrr_2016", "kdtw_2016", "ktvc_2016"),
c("klnk_2016", "kmsp_2016", "kmsn_2016", "kind_2016"),
c("kmke_2016", "kord_2016", "kewr_2016")
)
for (x in cityCloudList) {
cloudUse <- cloudSummary %>%
filter(source %in% x)
plotMaxObsc(cloudUse,
xVar="monthfct",
fillVar="obscType",
title="Maximum Cloud Obscuration",
facetOn="sourceName",
posnBar="fill"
)
}
A few findings include:
The ceiling heights can also be assessed:
cloudSummary <- cloudSummary %>%
mutate(ceilFactor=factor(case_when(ceilingHeight == -100 ~ "None",
ceilingHeight <= 1000 ~ "0-1000",
ceilingHeight <= 3000 ~ "1000-3000",
ceilingHeight <= 6000 ~ "3000-6000",
ceilingHeight <= 12000 ~ "6000-12000"
),
levels=c("None", "6000-12000", "3000-6000", "1000-3000", "0-1000")
)
)
plotMaxObsc(cloudSummary,
xVar="sourceName",
fillVar="ceilFactor",
title="Ceiling Height",
orderByVariable="ceilFactor",
orderByValue="None"
)
for (x in cityCloudList) {
cloudUse <- cloudSummary %>%
filter(source %in% x)
plotMaxObsc(cloudUse,
xVar="monthfct",
fillVar="ceilFactor",
title="Ceiling Height (proportion of hourly observations)",
facetOn="sourceName",
posnBar="fill",
yLabel="",
legendLabel="Ceiling Height (feet)"
)
}
Findings for ceiling height broadly line up with findings for maximum cloud obscuration, as expected:
The same analysis can be run for the minimum cloud height:
cloudSummary <- cloudSummary %>%
mutate(minCFactor=factor(case_when(cloudHeight == -100 ~ "None",
cloudHeight <= 1000 ~ "0-1000",
cloudHeight <= 3000 ~ "1000-3000",
cloudHeight <= 6000 ~ "3000-6000",
cloudHeight <= 12000 ~ "6000-12000"
),
levels=c("None", "6000-12000", "3000-6000", "1000-3000", "0-1000")
)
)
plotMaxObsc(cloudSummary,
xVar="sourceName",
fillVar="minCFactor",
title="Minimum Cloud Height",
orderByVariable="minCFactor",
orderByValue="None"
)
for (x in cityCloudList) {
cloudUse <- cloudSummary %>%
filter(source %in% x)
plotMaxObsc(cloudUse,
xVar="monthfct",
fillVar="minCFactor",
title="Minimum Cloud Height (proportion of hourly observations)",
facetOn="sourceName",
posnBar="fill",
yLabel="",
legendLabel="Minimum Cloud Height (feet)"
)
}
At a glance, there seem to be some similarities and differences among the locales:
A more analytical approach would look at the distance between the various locales. Broadly speaking, distance can be calculated based on counts by month by locale. No scaling is performed since each location has roughly the same number of observations by month, and the intent is to find macro similarities (e.g., if 1 city has 2% data in bucket x and the others all have 0% data in bucket x, this analysis would treat that as a minor difference where with scaled data it would be a primary difference driver).
There are three main variable to consider for distance:
Distances can be calculated using any or all of these variables. A function is created to that process can be repeated:
findCloudDist <- function(df,
byVar,
fctVar,
pivotVar="monthfct",
scaleDistData=FALSE,
returnPivotOnly=FALSE
) {
# FUNCTION ARGUMENTS
# df: data frame or tibble containing one record per locale and time
# byVar: the variable(s) by which the final distance file should be unique
# fctVar: the factor variables to be counted up, with the sums being the distance inputs
# pivotVar: the variable(s) which should be pivoted in to columns to make the file unique by byVar
# scale: whether to scale the data prior to calculating distances
# returnDistMatrix: whether to just return the pivoted data frame (default, FALSE, returns the distance matrix calculated from this pivoted data frame rather than the data frame itself)
# Create the counts data by byVar, pivoted by fctVar and pivotVar
baseData <- df %>%
group_by_at(vars(all_of(c(byVar, fctVar, pivotVar)))) %>%
summarize(n=n()) %>%
ungroup() %>%
pivot_wider(names_from=all_of(c(fctVar, pivotVar)), values_from="n") %>%
mutate_if(is.numeric, tidyr::replace_na, replace=0)
# If the data are only yo be pivoted, return and exit the function
if (returnPivotOnly) {
return(baseData)
}
# Split in to descriptors and data
descData <- baseData %>%
select_at(vars(all_of(byVar))) %>%
mutate(rowN=row_number())
distData <- baseData %>%
select_at(vars(-any_of(byVar))) %>%
as.matrix()
# Scale distdata if requested
if (scaleDistData) {
distData <- scale(distData)
}
# Create the distances, convert back to data frame, pivot_longer, and attach the labels
dist(distData) %>%
as.matrix() %>%
as_tibble() %>%
mutate(row1=row_number()) %>%
pivot_longer(-row1, names_to="row2", values_to="dist") %>%
mutate(row2=as.integer(row2)) %>%
inner_join(descData, by=c("row1"="rowN")) %>%
rename_at(vars(all_of(byVar)), ~paste0(., "_1")) %>%
inner_join(descData, by=c("row2"="rowN")) %>%
rename_at(vars(all_of(byVar)), ~paste0(., "_2"))
}
The function is then run for the obscuration data as an example:
# Find the distances for obscType by locale vs. locale
obscDist <- findCloudDist(cloudSummary,
byVar=c("source", "sourceName"),
fctVar="obscType",
pivotVar="monthfct"
)
obscDist
## # A tibble: 225 x 7
## row1 row2 dist source_1 sourceName_1 source_2 sourceName_2
## <int> <int> <dbl> <chr> <chr> <chr> <chr>
## 1 1 1 0 kdtw_2016 Detroit, MI (2016) kdtw_20~ Detroit, MI (2016)
## 2 1 2 627. kdtw_2016 Detroit, MI (2016) kewr_20~ Newark, NJ (2016)
## 3 1 3 669. kdtw_2016 Detroit, MI (2016) kgrb_20~ Green Bay, WI (2016)
## 4 1 4 294. kdtw_2016 Detroit, MI (2016) kgrr_20~ Grand Rapids, MI (20~
## 5 1 5 726. kdtw_2016 Detroit, MI (2016) kiah_20~ Houston, TX (2016)
## 6 1 6 347. kdtw_2016 Detroit, MI (2016) kind_20~ Indianapolis, IN (20~
## 7 1 7 1421. kdtw_2016 Detroit, MI (2016) klas_20~ Las Vegas, NV (2016)
## 8 1 8 1092. kdtw_2016 Detroit, MI (2016) klnk_20~ Lincoln, NE (2016)
## 9 1 9 361. kdtw_2016 Detroit, MI (2016) kmke_20~ Milwaukee, WI (2016)
## 10 1 10 345. kdtw_2016 Detroit, MI (2016) kmsn_20~ Madison, WI (2016)
## # ... with 215 more rows
A function can then be created to plot the distances as a heamap and to return the minimum distance for each locale:
plotCloudDist <- function(df,
var1="sourceName_1",
var2="sourceName_2",
met="dist",
roundDist=0,
title="Distance Between Locales",
subT=""
) {
# FUNCTION ARGUMENTS:
# df: tibble or data frame containing distance data
# var1: the variable containing the first locale
# var2: the variable containing the second locale
# dist: the variable containing the pre-calculated distance between var1 and var2
# roundDist: the rounding for the distance in the plot
# Process the data frame and exclude any occurences of distance to self
distData <- df %>%
select_at(vars(all_of(c(var1, var2, met)))) %>%
filter(get(var1) != get(var2))
# Get the locales by minimum distance to any other locale
distHiLo <- distData %>%
group_by_at(vars(all_of(var1))) %>%
filter(dist==min(dist)) %>%
arrange(dist) %>%
pull(var1)
# Create a heatmap of the distances
distData %>%
ggplot(aes(x=factor(get(var1), levels=distHiLo), y=factor(get(var2), levels=distHiLo))) +
geom_tile(aes(fill=dist)) +
geom_text(aes(label=round(dist, roundDist)), color="lightblue") +
labs(x="", y="", title=title, subtitle=subT) +
scale_fill_gradient("Distance", low="black", high="white") +
theme(axis.text.x=element_text(angle=90))
}
The process can then be run on the obscuration data as a check:
plotCloudDist(obscDist, subT="Based on % of obscuration type by month")
FINDINGS FOR OBSCURATION DISTANCE:
This is suggestive that there is a cold-weather cluster, a hot and humid cluster, and then several locales that are more or less standalone but could be grouped loosely to each other.
The functions can then be run on variations of the data:
# Run for minCFactor
findCloudDist(cloudSummary,
byVar=c("source", "sourceName"),
fctVar="minCFactor",
pivotVar="monthfct"
) %>%
plotCloudDist(subT="Based on % in each minimum cloud height bucket by month")
# Run for ceilFactor
findCloudDist(cloudSummary,
byVar=c("source", "sourceName"),
fctVar="ceilFactor",
pivotVar="monthfct"
) %>%
plotCloudDist(subT="Based on % in each ceiling height bucket by month")
Findings by minimum cloud height are similar to findings by obscuration.
Findings by ceiling height show that Lincoln and Newark are close to each other and that San Diego and Las Vegas are both segments of one. Traverse City and Grand Rapids are close to each other; as are New Orleans and Houston. There continues to be a large, close, segment of cold cities.
Of interest is whether a simple kmeans analysis returns the same findings:
set.seed(2006040940)
ceilDistData <- findCloudDist(cloudSummary,
byVar=c("source", "sourceName"),
fctVar="ceilFactor",
pivotVar="monthfct",
returnPivotOnly=TRUE
)
tibble::tibble(locale=ceilDistData$sourceName,
cluster=kmeans(dist(ceilDistData[3:ncol(ceilDistData)]), centers=5, nstart=1000)$cluster
) %>%
arrange(-cluster)
## # A tibble: 15 x 2
## locale cluster
## <chr> <int>
## 1 Grand Rapids, MI (2016) 5
## 2 Traverse City, MI (2016) 5
## 3 Las Vegas, NV (2016) 4
## 4 Newark, NJ (2016) 3
## 5 Houston, TX (2016) 3
## 6 Lincoln, NE (2016) 3
## 7 New Orleans, LA (2016) 3
## 8 Detroit, MI (2016) 2
## 9 Green Bay, WI (2016) 2
## 10 Indianapolis, IN (2016) 2
## 11 Milwaukee, WI (2016) 2
## 12 Madison, WI (2016) 2
## 13 Minneapolis, MN (2016) 2
## 14 Chicago, IL (2016) 2
## 15 San Diego, CA (2016) 1
The five-segment k-means is consistent - solo segments for San Diego and Las Vegas; a Grand Rapids-Traverse City segment; a cold-weather midwestern segment (Grand Rapids and Traverse City are on the downwind side of Lake Michigan and would be expected to see much different clouds than cities with similar temperatures); and a catch-all segment.
The data can also be assessed using hierarchical clustering:
hclust(dist(ceilDistData[3:ncol(ceilDistData)]), method="complete") %>%
plot(labels=ceilDistData$sourceName, cex=0.5, main="Hierarchical on Ceiling Height: method=complete")
hclust(dist(ceilDistData[3:ncol(ceilDistData)]), method="single") %>%
plot(labels=ceilDistData$sourceName, cex=0.5, main="Hierarchical on Ceiling Height: method=single")
Similar conclusions can be drawn from hierarchical clustering based on ceiling height buckets by month:
This is suggestive of either 6 or 7 segments in this data. Does k-means match up?
tibble::tibble(locale=ceilDistData$sourceName,
cluster=kmeans(dist(ceilDistData[3:ncol(ceilDistData)]), centers=6, nstart=1000)$cluster
) %>%
arrange(-cluster)
## # A tibble: 15 x 2
## locale cluster
## <chr> <int>
## 1 Las Vegas, NV (2016) 6
## 2 San Diego, CA (2016) 5
## 3 Newark, NJ (2016) 4
## 4 Lincoln, NE (2016) 4
## 5 Houston, TX (2016) 3
## 6 New Orleans, LA (2016) 3
## 7 Grand Rapids, MI (2016) 2
## 8 Traverse City, MI (2016) 2
## 9 Detroit, MI (2016) 1
## 10 Green Bay, WI (2016) 1
## 11 Indianapolis, IN (2016) 1
## 12 Milwaukee, WI (2016) 1
## 13 Madison, WI (2016) 1
## 14 Minneapolis, MN (2016) 1
## 15 Chicago, IL (2016) 1
tibble::tibble(locale=ceilDistData$sourceName,
cluster=kmeans(dist(ceilDistData[3:ncol(ceilDistData)]), centers=7, nstart=1000)$cluster
) %>%
arrange(-cluster)
## # A tibble: 15 x 2
## locale cluster
## <chr> <int>
## 1 Houston, TX (2016) 7
## 2 New Orleans, LA (2016) 7
## 3 Detroit, MI (2016) 6
## 4 Green Bay, WI (2016) 6
## 5 Minneapolis, MN (2016) 6
## 6 Newark, NJ (2016) 5
## 7 Lincoln, NE (2016) 5
## 8 Grand Rapids, MI (2016) 4
## 9 Traverse City, MI (2016) 4
## 10 Indianapolis, IN (2016) 3
## 11 Milwaukee, WI (2016) 3
## 12 Madison, WI (2016) 3
## 13 Chicago, IL (2016) 3
## 14 San Diego, CA (2016) 2
## 15 Las Vegas, NV (2016) 1
Very nicely aligned. The addition of the sixth segment splits the catch-all segment (from the previous 5 cluster analysis) in to a Houston-New Orleans segment and a Lincoln-Newark segment. The addition of the seventh segment splits the cold weather cities in to Chicago-Madison-Milwukee-Indianapolis; and Detroit-Green Bay-Minneapolis.
Are any differences observed in trends for minimum cloud height?
heightDistData <- findCloudDist(cloudSummary,
byVar=c("source", "sourceName"),
fctVar="minCFactor",
pivotVar="monthfct",
returnPivotOnly=TRUE
)
hclust(dist(heightDistData[3:ncol(heightDistData)]), method="complete") %>%
plot(labels=heightDistData$sourceName, cex=0.5,
main="Hierarchical on Minimum Cloud Height: method=complete"
)
hclust(dist(heightDistData[3:ncol(heightDistData)]), method="single") %>%
plot(labels=heightDistData$sourceName, cex=0.5,
main="Hierarchical on Minimum Cloud Height: method=single"
)
There are some differences when using the ‘complete’ linkage method:
When using the ‘single’ linkage method, these patterns become muted:
We would expect k-means to pull out 1-2 cold-weather clusters and a Houston-New Orleans cluster:
set.seed(2006041003)
tibble::tibble(locale=heightDistData$sourceName,
cluster=kmeans(dist(heightDistData[3:ncol(heightDistData)]), centers=5, nstart=1000)$cluster
) %>%
arrange(-cluster)
## # A tibble: 15 x 2
## locale cluster
## <chr> <int>
## 1 Las Vegas, NV (2016) 5
## 2 Lincoln, NE (2016) 5
## 3 Houston, TX (2016) 4
## 4 New Orleans, LA (2016) 4
## 5 San Diego, CA (2016) 3
## 6 Detroit, MI (2016) 2
## 7 Grand Rapids, MI (2016) 2
## 8 Indianapolis, IN (2016) 2
## 9 Milwaukee, WI (2016) 2
## 10 Madison, WI (2016) 2
## 11 Minneapolis, MN (2016) 2
## 12 Chicago, IL (2016) 2
## 13 Traverse City, MI (2016) 2
## 14 Newark, NJ (2016) 1
## 15 Green Bay, WI (2016) 1
tibble::tibble(locale=heightDistData$sourceName,
cluster=kmeans(dist(heightDistData[3:ncol(heightDistData)]), centers=6, nstart=1000)$cluster
) %>%
arrange(-cluster)
## # A tibble: 15 x 2
## locale cluster
## <chr> <int>
## 1 Houston, TX (2016) 6
## 2 New Orleans, LA (2016) 6
## 3 Newark, NJ (2016) 5
## 4 Green Bay, WI (2016) 5
## 5 San Diego, CA (2016) 4
## 6 Detroit, MI (2016) 3
## 7 Grand Rapids, MI (2016) 3
## 8 Indianapolis, IN (2016) 3
## 9 Milwaukee, WI (2016) 3
## 10 Madison, WI (2016) 3
## 11 Minneapolis, MN (2016) 3
## 12 Chicago, IL (2016) 3
## 13 Traverse City, MI (2016) 3
## 14 Lincoln, NE (2016) 2
## 15 Las Vegas, NV (2016) 1
tibble::tibble(locale=heightDistData$sourceName,
cluster=kmeans(dist(heightDistData[3:ncol(heightDistData)]), centers=7, nstart=1000)$cluster
) %>%
arrange(-cluster)
## # A tibble: 15 x 2
## locale cluster
## <chr> <int>
## 1 Indianapolis, IN (2016) 7
## 2 Milwaukee, WI (2016) 7
## 3 Madison, WI (2016) 7
## 4 Minneapolis, MN (2016) 7
## 5 Chicago, IL (2016) 7
## 6 Lincoln, NE (2016) 6
## 7 Newark, NJ (2016) 5
## 8 Green Bay, WI (2016) 5
## 9 Las Vegas, NV (2016) 4
## 10 Detroit, MI (2016) 3
## 11 Grand Rapids, MI (2016) 3
## 12 Traverse City, MI (2016) 3
## 13 Houston, TX (2016) 2
## 14 New Orleans, LA (2016) 2
## 15 San Diego, CA (2016) 1
tibble::tibble(locale=heightDistData$sourceName,
cluster=kmeans(dist(heightDistData[3:ncol(heightDistData)]), centers=8, nstart=1000)$cluster
) %>%
arrange(-cluster)
## # A tibble: 15 x 2
## locale cluster
## <chr> <int>
## 1 San Diego, CA (2016) 8
## 2 Indianapolis, IN (2016) 7
## 3 Milwaukee, WI (2016) 7
## 4 Madison, WI (2016) 7
## 5 Minneapolis, MN (2016) 7
## 6 Chicago, IL (2016) 7
## 7 Houston, TX (2016) 6
## 8 New Orleans, LA (2016) 6
## 9 Las Vegas, NV (2016) 5
## 10 Detroit, MI (2016) 4
## 11 Grand Rapids, MI (2016) 4
## 12 Traverse City, MI (2016) 4
## 13 Newark, NJ (2016) 3
## 14 Green Bay, WI (2016) 2
## 15 Lincoln, NE (2016) 1
With five segments, there is:
Adding the sixth cluster splits apart Las Vegas and Lincoln. Adding the seventh cluster cleaves Grand Rapids-Traverse City-Detroit from the other cold weather cities. Adding the eighth cluster splits apart Green Bay and Newark.
METAR data include descriptions of the precipitation occuring at any given time. Two of the most common precipitation forms are rain (RA) and SN(). These can occur together, denoted as RASN or SNRA in the METAR.
Further, the precipitation type can be classified using a prefix as light (-), moderate (no prefix), or heavy (+). So, RA would be moderate rain, -SNRA would be a light snow-rain mix, +RA would be heavy rain.
Additionally, the timing of the precipitation event is captured in the remarks using B (begin) and E (end). So, an hourly METAR of RAB20E35B50 would mean rain started at 20 past the hour, ended at 35 past the hour, and began again at 50 past the hour. Since METAR are often taken just before the top of the hour, a four-digit time is used if it is in the ‘previous’ hour; for example, RAB1959E36 in the 2053Z METAR.
Broadly speaking, precipitation at a specific point in time can be extracted from the main METAR (preceding RMK) whil prcipitation intervals can be extracted by parsing the remarks (following RMK). There can be misalignment between these; for example, an interval that suggests rain is occuring at a time the METAR does not show rain.
There are two key functions for the extraction process:
Helper function include:
Example code includes:
# Extract the precipitation begin and end times from a processed METAR file
extractPrecipData <- function(df,
pType,
showRegex=TRUE,
showSTR=FALSE,
showTable=FALSE,
dupError=TRUE
) {
# FUNCTION ARGUMENTS:
# df: the data frame or tibble
# pType: the regex code to be used
# showRegex: boolean, whether to print out the regex code used
# showSTR: boolean, whether to show the file str()
# showTable: boolean, whether to show the file counts of start and end times
# dupError: boolean, if TRUE will throw an error if any duplicates are found; if FALSE just notes it
# Extract the METAR and the date-time from a processed list
procMET <- df %>%
select(origMETAR, dtime)
# Check whether there are comments for the desired pType either beginning or ending
keyPattern <- paste0("(", pType, "[B|E]\\d+[0-9BE]*)")
if (showRegex) { cat("\nRegex search code is:", keyPattern, "\n\n") }
procMET <- procMET %>%
mutate(precipData=str_extract(origMETAR, pattern=keyPattern),
isPrecip=grepl(paste0(pType, "[B|E]"), origMETAR, perl=TRUE),
nBegin=pmax(0, str_count(precipData, "B"), na.rm=TRUE),
nEnd=pmax(0, str_count(precipData, "E"), na.rm=TRUE),
dateUTC=lubridate::date(dtime),
hourUTC=lubridate::hour(dtime)
)
if (showSTR) {
str(procMET)
cat("\n")
}
# Check the counts of precipitation beginning and rain ending
if (showTable) {
procMET %>%
count(isPrecip, nBegin, nEnd) %>%
print()
}
# Check that the file is unique by time
dupAns <- procMET %>%
select(dateUTC, hourUTC) %>%
duplicated %>%
any()
if (dupError) {
if (dupAns) { stop("Duplicates observed, investigate and fix\n") }
} else {
cat("\nAre there any problems with duplicated keys?", dupAns, "\n\n")
}
procMET
}
# Helper function to extract the beginning and ending times using str_extract_all
getBeginEndTimeMatrix <- function(file,
pullVar="precipData",
pState="B"
) {
# FUNCTION ARGUMENTS:
# file: file containing data
# pullVar: the variable of interest in file
# pState: the precipitation state of interest - B (begin) or E (end)
file %>%
pull(pullVar) %>%
str_extract_all(paste0(pState, "\\d+"), simplify=TRUE) %>%
as.data.frame(stringsAsFactors=FALSE)
}
getBeginEndTimeVector <- function(timeExtractFile,
origFullFile,
extractVar,
extractSym
) {
# FUNCTION ARGUMENTS:
# timeExtractFile: the file containing the extracted begin and end times
# origFullFile: the file that the extraction was run from (includes date and time)
# extractVar: the variable name for the column to be extracted from
# extractSym: whether it is B (begin) or E (end)
timeExtractFile %>%
cbind(origFullFile[, c("dateUTC", "hourUTC")]) %>%
mutate(hourChar=str_pad(str_trim(as.character(hourUTC)), width=2, pad="0")) %>%
apply(1, FUN=extractTime, var=extractVar, sym=extractSym)
}
# Convert the rain begins data to the hour and minute associated to the UTC
extractTime <- function(x,
var,
sym="B"
) {
# FUNCTION ARGUMENTS:
# x: the named vector
# var: the variable name to extract
# sym: whether the suppression needs to be on B (begin) or E (end) to convert to a zero-padded number
# Extracting the time is tricky
# If NA or blank, then the time is NA
# Replace the leading B or E and there should then be a 2-digit or 4-digit number
# If 2-digit, then append date amd hour in front of it
# If 4-digit, then append date in front of it
# Exception: if 4-digit and begins with 23, then need to use the previous day
if (is.na(x[var]) | x[var]=="") {
utcReturn <- NA
}
else {
utcUse <- str_replace(x[var], sym, "")
if (str_length(utcUse)==4) {
utcReturn <- paste0(x["dateUTC"], " ", utcUse)
# If a 4-digit time starts with 23 and is in the 0Z METAR, it is part of the previous day
if(str_sub(utcUse, 1, 2)=="23" & as.numeric(x["hourChar"])==0) {
utcReturn <- paste0(as.Date(x["dateUTC"]) - 1, " ", utcUse)
}
} else if (str_length(utcUse)==2) {
utcReturn <- paste0(x["dateUTC"], " ", x["hourChar"], utcUse)
} else {
cat("\nCannot parse data: ", x, "\n", x[var], "\n", var, sym, utcUse)
stop()
}
}
utcReturn
}
suggestBeginEndTimes <- function(df,
regMatch,
nColMax=3,
printNACols=FALSE
) {
# FUNCTION ARGUMENTS
# df: the data frame to be examined
# regMatch: the relevant regex extraction code
# nColMax: the maximum number of columns allowed for precipitation begins and precipitation ends
# printNACols: boolean, whether to print the NA column sums
# Pull the data and check for the specified precipitation pattern and lags
# Needs to follow the \\d{6}Z that shows the time and precede the RMK that denotes the remarks
sugStates <- df %>%
select(dtime, origMETAR) %>%
mutate(curPrecip=str_detect(origMETAR, paste0("\\d{6}Z.*", regMatch, ".*RMK")),
lagPrecip=lag(curPrecip, 1)
)
# Use the analysis data to look for begins and ends flagged in the remarks
sugBE <- extractPrecipData(sugStates, pType=regMatch)
# Inner join the data by dtime
sugBEJoin <- sugBE %>%
select(dtime, precipData, chgPrecip=isPrecip, dateUTC, hourUTC)
sugStates <- sugStates %>%
inner_join(sugBEJoin, by="dtime")
# Get the beginning and end times data for the desired precipitation type
sugBegin <- getBeginEndTimeMatrix(sugStates, pState="B")
sugEnd <- getBeginEndTimeMatrix(sugStates, pState="E")
# Ensure that user-defined maximum number of columns is not exceeded
if (ncol(sugBegin) > nColMax | ncol(sugEnd) > nColMax) {
stop("Function argument set to allow only for 0 - ", nColMax, "columns, investigate and fix")
}
# For each iteration, create testBTn and testETn (use NA if no column)
for (ctr in 1:nColMax) {
# Create NA as the baseline
assign(paste0("testBT", ctr), NA)
assign(paste0("testET", ctr), NA)
# Assign begin if exists
if (ncol(sugBegin) >= ctr) {
bt <- getBeginEndTimeVector(sugBegin, sugStates, extractVar=paste0("V", ctr), extractSym="B")
assign(paste0("testBT", ctr), bt)
}
# Assign end if exists
if (ncol(sugEnd) >= ctr) {
et <- getBeginEndTimeVector(sugEnd, sugStates, extractVar=paste0("V", ctr), extractSym="E")
assign(paste0("testET", ctr), et)
}
}
# Integrate to a single file, allowing for nMaxCol applications of variables
# Create file sugExceptions as sugStates and initialize the columns for begins and ends
sugExceptions <- sugStates %>%
mutate(begins=0, ends=0)
# Loop through up to nColMax, adding the appropriate bn=testBTn and en=testETn
for (ctr in 1:nColMax) {
sugExceptions <- sugExceptions %>%
mutate(!!paste0("b", ctr) := get(paste0("testBT", ctr)),
!!paste0("e", ctr) := get(paste0("testET", ctr)),
begins=begins + ifelse(is.na(get(paste0("b", ctr))), 0, 1),
ends=ends + ifelse(is.na(get(paste0("e", ctr))), 0, 1)
)
}
# Create the final version by checking key criteria
# Add a criteria for b before e if previous; and e before b if not previous?
sugExceptions <- sugExceptions %>%
mutate(etob=begins > ends,
btoe=ends > begins,
needBegin=curPrecip & !lagPrecip & !etob,
needEnd=!curPrecip & lagPrecip & !btoe,
overBegin=etob & (lagPrecip | !curPrecip),
overEnd=btoe & (curPrecip | !lagPrecip)
)
# Report on the NA status by column
if (printNACols) {
colSums(is.na(sugExceptions)) %>%
print()
}
# Flag potential issues
cat("\nNeed Begin time\n")
sugExceptions %>%
filter(needBegin) %>%
select(dtime, origMETAR) %>%
print()
cat("\nNeed End time\n")
sugExceptions %>%
filter(needEnd) %>%
select(dtime, origMETAR) %>%
print()
cat("\nExtraneous Begin time\n")
sugExceptions %>%
filter(overBegin) %>%
select(dtime, paste0("b", 1:nColMax)) %>%
print()
cat("\nExtraneous End time\n")
sugExceptions %>%
filter(overEnd) %>%
select(dtime, paste0("e", 1:nColMax)) %>%
print()
cat("\nWrong amount of begins or ends\n")
sugExceptions %>%
mutate(absMatch=abs(begins-ends), absLag=lag(absMatch, 1), absLead=lead(absMatch, 1)) %>%
filter(pmax(absMatch, absLag, absLead) > 1) %>%
select(dtime, paste0("e", 1:nColMax), paste0("b", 1:nColMax)) %>%
print()
sugExceptions %>%
filter(needBegin | needEnd | overBegin | overEnd | abs(begins - ends) > 1)
}
A function is then created to run the full precipitation extraction:
# Function to create the time intervals data
createPrecipInterval <- function(endVector,
beginVector,
endExclude=c(),
beginExclude=c(),
sState=FALSE,
nMinPrint=1,
maxProb=1000,
nMaxPrint=1
) {
# FUNCTION ARGUMENTS:
# endVector: vector of end times
# beginVector: vector of begin times
# endExclude: vector of end times to exclude
# beginExclude: vector of begin times to exclude
# sState: the starting precipitation state (TRUE for precipitation., FALSE for no precipitation)
# nMinPrint: index for the first position of negative intervals to show (typically 1)
# maxProb: maximum duration (minutes) of precipitation to flag a print out
# nMaxPrint: index for the first position of very long intervals to show (typically 1)
# If the starting state is one of precipitation, allow it to 'burn in' by deleting the first end time
if(sState) { endVector <- endVector[2:length(endVector)] }
# Create the interval data
endsUse <- endVector[!(endVector %in% lubridate::ymd_hm(endExclude))]
beginsUse <- beginVector[!(beginVector %in% lubridate::ymd_hm(beginExclude))]
intervalData <- endsUse - beginsUse
# Show a summary of the interval data
print(summary(as.numeric(intervalData)))
# If there are any non-positive intervals, print the data causing the first of them
if (min(as.numeric(intervalData)) <= 0) {
cat("\nProblem Detected - Intervals are not positive. Data to help investigate\n")
posns <- which(as.numeric(intervalData) <= 0)
posns <- max(1, posns[1]-5):min(length(beginsUse), length(endsUse), posns[nMinPrint]+5)
cat("\nVector of Begins\n")
print(lubridate::as_datetime(beginsUse[posns]))
cat("\nVector of Ends\n")
print(lubridate::as_datetime(endsUse[posns]))
cat("\n")
}
# If there are any very long positive intervals, print the data causing the first five of them
if (max(as.numeric(intervalData)) >= maxProb) {
cat("\nPotential problem Detected - very long. Data to help investigate\n")
posns <- which(as.numeric(intervalData) >= maxProb)
cat("\nPositions with problems are:", posns)
posns <- max(1, posns[1]-5):min(length(beginsUse), length(endsUse), posns[min(length(posns), nMaxPrint)]+5)
cat("\nVector of Begins\n")
print(lubridate::as_datetime(beginsUse[posns]))
cat("\nVector of Ends\n")
print(lubridate::as_datetime(endsUse[posns]))
cat("\n")
}
# Return the interval data
intervalData
}
createPrecipIntervalPlots <- function(intervalData,
beginsData,
titleText,
yAxisText,
beginExclude=c(),
returnPlotsAndData=FALSE
) {
# FUNCTION ARGUMENTS:
# intervalData: the precipitation interval data
# beginsData: the precipitation begin times
# titleText: plot title
# yAxisText: plot y-axis
# beginExclude: begin times to exclude
# returnPlotsAndData: whether to return the plot objects
# Exclude any data from begins as needed
beginsData <- beginsData[!(beginsData %in% lubridate::ymd_hm(beginExclude))]
# Create a plotting data frame
histFrame <- data.frame(minutesPrecip=as.numeric(intervalData),
month=lubridate::month(beginsData),
rainDate=lubridate::date(beginsData)
) %>%
mutate(hoursPrecip=minutesPrecip/60)
# Plot the precipitation durations in hours
p1 <- histFrame %>%
ggplot(aes(x=hoursPrecip)) +
geom_histogram() +
labs(title=titleText, x=yAxisText,
subtitle="Distribution of hours per unique precipitation event"
)
print(p1)
# Plot the precipitation by day in hours
p2 <- histFrame %>%
group_by(rainDate) %>%
summarize(hoursPrecip=sum(hoursPrecip)) %>%
ggplot(aes(x=hoursPrecip)) +
geom_histogram() +
labs(title=titleText, x=yAxisText,
subtitle="Distribution of hours per day of precipitation (on days when 1+ minutes occurred)"
)
print(p2)
# Plot the rain totals (in hours) by month
# Create a data frame of all months and merge in precipitation data as an where available (0 otherwise)
monthFrame <- histFrame %>%
group_by(month) %>%
summarize(minutesPrecip=sum(minutesPrecip), hoursPrecip=sum(hoursPrecip), nPrecip=n()) %>%
right_join(data.frame(month=1:12, monthName=month.abb[1:12]), by="month") %>%
tidyr::replace_na(list(minutesPrecip=0, hoursPrecip=0, nPrecip=0))
# print(monthFrame)
p3 <- monthFrame %>%
ggplot(aes(x=factor(monthName, levels=month.abb[1:12]), y=hoursPrecip)) +
geom_col() +
labs(title=titleText, y=yAxisText, x="") +
geom_text(aes(y=2 + hoursPrecip, label=round(hoursPrecip, 1)))
print(p3)
if (returnPlotsAndData) {
list(histFrame=histFrame, monthFrame=monthFrame, p1=p1, p2=p2, p3=p3)
} else {
NULL
}
}
# Precipitation regex to name mapper
precipMapper <- c("(?<!BL)SN"="Snowfall",
"(?<!FZ)RA"="Rainfall",
"(?<!VC)TS"="Thunderstorm"
)
# Combining all the functions in one place
runFullPrecipExtraction <- function(df,
pType,
titleText=NULL,
yAxisText=NULL,
endExclude=c(),
beginExclude=c(),
endAdd=c(),
beginAdd=c(),
maxProb=1000,
sState=FALSE,
makePlots=TRUE,
returnPlotsAndData=FALSE,
axisMapper=precipMapper,
titleMapper=cityNameMapper,
nColMax=3,
frameName=deparse(substitute(df))
) {
# FUNCTION ARGUMENTS
# df: the data frame or tibble
# ptype: the regex for the precipitation type of interest
# titleText: title text for the associated plot
# yAxisText: y-axis text for the associated plot
# endExclude: vector of end times to be excluded, formatted as yyyy-mm-dd hhmm (all zero-padded)
# beginExclude: vector of begin times to be excluded, formatted as yyyy-mm-dd hhmm (all zero-padded)
# endAdd: vector of end times to be added, formatted as yyyy-mm-dd hhmm (all zero-padded)
# beginAdd: vector of begin times to be added, formatted as yyyy-mm-dd hhmm (all zero-padded)
# maxProb: length of precipitation (in minutes) beyond which a note is thrown
# sState: whether the starting state is with precipitation
# makePlots: whether to create plots
# returnPlotsAndData: whether to return the plot object with the other data
# axisMapper: mapping file to create y-axis names from pType
# titleMapper: mapping file to create the title names from df
# nColMax: the maximum allowed number of columns for each of precipitation begin and end times
# Create the yAxisText if it has not been provided
if (is.null(yAxisText)) {
yAxisText <- paste0("Hours of ", axisMapper[pType])
}
# Create the title if it has not been provided
if (is.null(titleText)) {
titleText <- paste0(titleMapper[frameName], " - Hours of ", axisMapper[pType])
}
# Extract the precipitation data from a specified processed METAR file
testFileProc <- extractPrecipData(df, pType=pType)
# Confirm that the two-column specification is met (should relax hard-coding on this)
testBegin <- getBeginEndTimeMatrix(testFileProc, pState="B")
testEnd <- getBeginEndTimeMatrix(testFileProc, pState="E")
if ((ncol(testBegin) > nColMax) | ((ncol(testEnd) > nColMax))) {
cat("\nBegin columns:", ncol(testBegin), "\t\tEndcolumns:", ncol(testEnd))
stop("Hard-coded for at most", nColMax, "columns each of begin/end- Fix")
}
# Initialize testAllBegins and testAllEnds
testAllBegins <- beginAdd
testAllEnds <- endAdd
# For each iteration, append to testAllBegins and testAllEnds as appropriate
for (ctr in 1:nColMax) {
# Append begin if exists
if (ncol(testBegin) >= ctr) {
bt <- getBeginEndTimeVector(testBegin, testFileProc, extractVar=paste0("V", ctr), extractSym="B")
testAllBegins <- c(testAllBegins, bt[!is.na(bt)])
}
# Assign end if exists
if (ncol(testEnd) >= ctr) {
et <- getBeginEndTimeVector(testEnd, testFileProc, extractVar=paste0("V", ctr), extractSym="E")
testAllEnds <- c(testAllEnds, et[!is.na(et)])
}
}
# Convert to date-time and sort
testAllBegins <- testAllBegins %>%
lubridate::ymd_hm() %>%
sort()
testAllEnds <- testAllEnds %>%
lubridate::ymd_hm() %>%
sort()
# Create the intervals
testIntervals <- createPrecipInterval(testAllEnds,
testAllBegins,
endExclude=endExclude,
beginExclude=beginExclude,
maxProb=maxProb,
sState=sState
)
# Create the precipitation plots
plotOut <- NULL
if (makePlots) {
plotOut <- createPrecipIntervalPlots(testIntervals,
testAllBegins,
titleText=titleText,
yAxisText=yAxisText,
beginExclude=beginExclude,
returnPlotsAndData=returnPlotsAndData
)
}
if (!returnPlotsAndData) { plotOut <- NULL }
# Return all of the key files, along with the parameters used
keyParams <- list(fileName=frameName,
pType=pType,
endExclude=endExclude,
beginExclude=beginExclude,
endAdd=endAdd,
beginAdd=beginAdd,
maxProb=maxProb,
sState=sState
)
list(keyParams=keyParams,
testFileProc=testFileProc,
testAllBegins=testAllBegins[!(testAllBegins %in% lubridate::ymd_hm(beginExclude))],
testAllEnds=testAllEnds[!(testAllEnds %in% lubridate::ymd_hm(endExclude))],
testIntervals=testIntervals,
plotOut=plotOut
)
}
The functions are then run, using the Chicago 2016 data:
# Run for Chicago 2016, extracting rain that is not freezing rain
kord2016ExceptRA <- suggestBeginEndTimes(kord_2016, regMatch="(?<!FZ)RA")
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 2 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-03-24 21:51:00 KORD 242151Z 34014KT 1 3/4SM -RASN BR SCT010 OVC016 01/00~
## 2 2016-08-29 19:51:00 KORD 291951Z 25009KT 1SM R10L/1200VP6000FT +TSRA BKN041CB~
##
## Need End time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-07-13 18:51:00 KORD 131851Z 22009G18KT 10SM FEW027 SCT047 BKN250 28/22 A~
##
## Extraneous Begin time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, b1 <chr>, b2 <chr>, b3 <lgl>
##
## Extraneous End time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
kord2016ExceptRA
## # A tibble: 3 x 22
## dtime origMETAR curPrecip lagPrecip precipData chgPrecip
## <dttm> <chr> <lgl> <lgl> <chr> <lgl>
## 1 2016-03-24 21:51:00 KORD 242~ TRUE FALSE <NA> FALSE
## 2 2016-07-13 18:51:00 KORD 131~ FALSE TRUE <NA> FALSE
## 3 2016-08-29 19:51:00 KORD 291~ TRUE FALSE <NA> FALSE
## # ... with 16 more variables: dateUTC <date>, hourUTC <int>, begins <dbl>,
## # ends <dbl>, b1 <chr>, e1 <chr>, b2 <chr>, e2 <chr>, b3 <lgl>, e3 <lgl>,
## # etob <lgl>, btoe <lgl>, needBegin <lgl>, needEnd <lgl>, overBegin <lgl>,
## # overEnd <lgl>
# Run extraction for Chicago 2016
# Attempt to use on the kord2016 rain data
kordRain2016Test <- runFullPrecipExtraction(kord_2016,
pType="(?<!FZ)RA",
titleText="Chicago, IL Rainfall (hours) in 2016",
yAxisText="Hours of Rain",
endExclude=c(),
beginExclude=c(),
endAdd=c("2016-07-13 1851"),
beginAdd=c("2016-03-24 2151", "2016-08-29 1951"),
maxProb=1440,
sState=FALSE,
makePlots=TRUE
)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 14.00 33.50 70.51 83.50 1243.00
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
A function can then be written to test the intervals for consistency:
intervalConsistency <- function(lst,
pType,
maxPrint=NULL
){
# FUNCTION ARGUMENTS
# lst: list
# pType: precipitation type
# maxPrint: maximum number of mismatches to print (default is to print all)
require(lubridate)
# Extract the beginning and interval times
begins <- lst[["testAllBegins"]]
ends <- lst[["testAllEnds"]]
durs <- lst[["testIntervals"]]
# Create intervals from the raw list file
precipInts <- lubridate::interval(begins, begins + durs - 1)
# Extract the METAR and date-time information
metar <- lst[["testFileProc"]][["origMETAR"]]
dtime <- lst[["testFileProc"]][["dtime"]]
# Take each METAR observation and check two factors
# Is the precipitation type recorded in that METAR?
# Does that METAR fall in any of the intervals?
precipMETAR <- grepl(paste0("\\d{6}Z.*", pType, ".*RMK"), metar, perl=TRUE)
intMETAR <- sapply(dtime, FUN=function(x) {x %within% precipInts %>% any()})
# Check for the consistency of the observations and print the mismatches
print(table(precipMETAR, intMETAR))
mism <- which(precipMETAR != intMETAR)
if (length(mism) == 0) {
cat("\nFull matches between METAR observations and intervals\n")
} else {
if (is.null(maxPrint)) { maxPrint <- length(mism) }
for (x in mism[1:min(maxPrint, length(mism))]) {
cat("\nMismatch at time", strftime(dtime[x], format="%Y-%m-%d %H:%M", tz="UTC"), "UTC\n")
print(metar[max(1, x-2):min(length(metar), x+2)])
}
}
list(precipInts=precipInts, mismatches=mism, mismatchTimes=dtime[mism])
}
And, the function can then be run for the Chicago 2016 data:
tmp <- intervalConsistency(kordRain2016Test, pType="(?<!FZ)RA")
## Loading required package: lubridate
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8349 0
## TRUE 0 466
##
## Full matches between METAR observations and intervals
So, the process works to extract precipitation data, and for the case of Chicago 2016, creates alignment using only the addition of 1 end time and 2 begin times.
An additional helper function is written to take the exceptions from suggestBeginEndTimes() and to create the vectors for endExclude(), beginExclude(), endAdd(), and beginAdd() automatically:
# Helper function to create times in the format preferred by the other functions
timeFormatter <- function(timeVec) {
if (length(timeVec) == 0) {
return(character(0))
}
paste0(as.Date(timeVec),
" ",
str_pad(lubridate::hour(timeVec), pad="0", side="left", width=2),
str_pad(lubridate::minute(timeVec), pad="0", side="left", width=2)
)
}
convertExceptionsToTimes <- function(excFile, maxCol=3) {
# Basic beginAdd (just add the METAR time)
beginAdd <- excFile %>%
filter(needBegin) %>%
pull(dtime) %>%
timeFormatter()
# Basic endAdd (just add the METAR time)
endAdd <- excFile %>%
filter(needEnd) %>%
pull(dtime) %>%
timeFormatter()
# Basic endExclude (just delete the earliest end time - maybe add a guard check later)
endExclude <- excFile %>%
filter(overEnd) %>%
select_at(vars(all_of(paste0("e", 1:maxCol)))) %>%
pull(1)
# Basic beginExclude (just delete the earliest begin time - maybe add a guard check later)
beginExclude <- excFile %>%
filter(overBegin) %>%
select_at(vars(all_of(paste0("b", 1:maxCol)))) %>%
pull(1)
list(endAdd=endAdd, beginAdd=beginAdd, endExclude=endExclude, beginExclude=beginExclude)
}
And the process can be attempted on the 2016 Chicago snow data:
# Run for Chicago 2016, extracting snow that is not blowing snow
kord2016ExceptSN <- suggestBeginEndTimes(kord_2016, regMatch="(?<!BL)SN")
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 4 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-02-10 00:51:00 KORD 100051Z 31012KT 4SM -SN FEW024 OVC040 M09/M13 A2993 ~
## 2 2016-03-24 21:51:00 KORD 242151Z 34014KT 1 3/4SM -RASN BR SCT010 OVC016 01/00~
## 3 2016-04-08 22:51:00 KORD 082251Z 31014G31KT 9SM -SN SCT045 BKN050 OVC080 02/M~
## 4 2016-12-24 01:51:00 KORD 240151Z 17011KT 4SM -SN BR SCT015 OVC039 01/M01 A299~
##
## Need End time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Extraneous Begin time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, b1 <chr>, b2 <chr>, b3 <lgl>
##
## Extraneous End time
## # A tibble: 1 x 4
## dtime e1 e2 e3
## <dttm> <chr> <chr> <lgl>
## 1 2016-04-09 04:51:00 2016-04-09 0446 <NA> NA
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
kord2016ExceptSNList <- convertExceptionsToTimes(kord2016ExceptSN)
# Attempt to use on the kord2016 snow data
kordSnow2016Test <- runFullPrecipExtraction(kord_2016,
pType="(?<!BL)SN",
titleText="Chicago, IL Snowfall (hours) in 2016",
yAxisText="Hours of Snow",
endExclude=kord2016ExceptSNList$endExclude,
beginExclude=kord2016ExceptSNList$beginExclude,
endAdd=kord2016ExceptSNList$endAdd,
beginAdd=kord2016ExceptSNList$beginAdd,
maxProb=1440,
sState=FALSE,
makePlots=TRUE
)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 20.25 47.50 134.12 140.75 1016.00
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
tmp <- intervalConsistency(kordSnow2016Test, pType="(?<!BL)SN")
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8532 0
## TRUE 0 283
##
## Full matches between METAR observations and intervals
While there is still some work to be done (the current function makes assumptions about the issues with the precipitation data and associated use of e1/b1, as well as not handling cases where there are, for example, 2 begins with no ends), the function shows promise in automating the process of handling precipitation data.
A few edits have been made to the underlying functions:
The functions can then be combined, allowing for everything to be run at once:
combinedPrecipitation <- function(df,
pType,
titleText=NULL,
yAxisText=NULL,
maxProb=1440,
sState=FALSE,
makePlots=TRUE,
axisMapper=precipMapper,
titleMapper=cityNameMapper,
frameName=NULL,
maxConsistencyPrint=NULL
) {
# FUNCTION ARGUMENTS:
# df: the data frame or tibble containing the METAR and dtime
# pType: the regex code for extracting the relevant precipitation type
# titleText: Desired plot title (will be created from name of df if NULL)
# yAxisText: desired y-axis labels (will be created from regex if NULL)
# maxProb: flag any precipitation events longer than this number of minutes
# sState: boolean, is there precipitation in the first record?
# makePlots: boolean, should plots be created?
# axisMapper: mapping file for creating plot axis name from precipitation type
# titleMapper: mapping file for creating plot title from data frame name
# frameName: name of the data frame (kluge due to deparse(substitute(df)) returning "df" if this function is called by another function or in a for loop
# maxConsistencyPrint: how many mismatches should be printed (default, NULL, prints all)
# Create titleText and yAxisText if they are not already created
if (is.null(yAxisText)) {
yAxisText <- paste0("Hours of ", axisMapper[pType])
}
# Create the frame name if it has nott been passed
if (is.null(frameName)) {
frameName <- deparse(substitute(df))
}
# Create the title if it has not been provided
if (is.null(titleText)) {
titleText <- paste0(titleMapper[frameName], " - Hours of ", axisMapper[pType])
}
# Get the suggested edits to the begin and end times, then convert to a usable list
excSuggest <- suggestBeginEndTimes(df, regMatch=pType)
excList <- convertExceptionsToTimes(excSuggest)
# Run the full precipitation extraction
precipData <- runFullPrecipExtraction(df,
pType=pType,
titleText=titleText,
yAxisText=yAxisText,
endExclude=excList$endExclude,
beginExclude=excList$beginExclude,
endAdd=excList$endAdd,
beginAdd=excList$beginAdd,
maxProb=maxProb,
sState=sState,
makePlots=makePlots,
frameName=frameName
)
# Check for consistency
precipIntervals <- intervalConsistency(precipData, pType=pType, maxPrint=maxConsistencyPrint)
# For the time being, return the most important data (the precipData list)
precipData
}
The function can then be tested on a few files:
# Run for Chicago 2016 for snow, rain, and thunderstorm
kord2016SN <- combinedPrecipitation(kord_2016, pType="(?<!BL)SN")
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 4 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-02-10 00:51:00 KORD 100051Z 31012KT 4SM -SN FEW024 OVC040 M09/M13 A2993 ~
## 2 2016-03-24 21:51:00 KORD 242151Z 34014KT 1 3/4SM -RASN BR SCT010 OVC016 01/00~
## 3 2016-04-08 22:51:00 KORD 082251Z 31014G31KT 9SM -SN SCT045 BKN050 OVC080 02/M~
## 4 2016-12-24 01:51:00 KORD 240151Z 17011KT 4SM -SN BR SCT015 OVC039 01/M01 A299~
##
## Need End time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Extraneous Begin time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, b1 <chr>, b2 <chr>, b3 <lgl>
##
## Extraneous End time
## # A tibble: 1 x 4
## dtime e1 e2 e3
## <dttm> <chr> <chr> <lgl>
## 1 2016-04-09 04:51:00 2016-04-09 0446 <NA> NA
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 20.25 47.50 134.12 140.75 1016.00
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8532 0
## TRUE 0 283
##
## Full matches between METAR observations and intervals
kord2016RA <- combinedPrecipitation(kord_2016, pType="(?<!FZ)RA")
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 2 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-03-24 21:51:00 KORD 242151Z 34014KT 1 3/4SM -RASN BR SCT010 OVC016 01/00~
## 2 2016-08-29 19:51:00 KORD 291951Z 25009KT 1SM R10L/1200VP6000FT +TSRA BKN041CB~
##
## Need End time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-07-13 18:51:00 KORD 131851Z 22009G18KT 10SM FEW027 SCT047 BKN250 28/22 A~
##
## Extraneous Begin time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, b1 <chr>, b2 <chr>, b3 <lgl>
##
## Extraneous End time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 14.00 33.50 70.51 83.50 1243.00
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8349 0
## TRUE 0 466
##
## Full matches between METAR observations and intervals
kord2016TS <- combinedPrecipitation(kord_2016, pType="(?<!VC)TS")
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-10-12 17:51:00 KORD 121751Z 23008G19KT 3SM R10L/5000VP6000FT -TSRA BR FE~
##
## Need End time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Extraneous Begin time
## # A tibble: 2 x 4
## dtime b1 b2 b3
## <dttm> <chr> <chr> <lgl>
## 1 2016-08-28 08:51:00 2016-08-28 0825 <NA> NA
## 2 2016-09-21 17:51:00 2016-09-21 1744 <NA> NA
##
## Extraneous End time
## # A tibble: 3 x 4
## dtime e1 e2 e3
## <dttm> <chr> <chr> <lgl>
## 1 2016-08-28 09:51:00 2016-08-28 0910 <NA> NA
## 2 2016-09-21 18:51:00 2016-09-21 1759 <NA> NA
## 3 2016-10-06 21:51:00 2016-10-06 2131 <NA> NA
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.00 23.50 43.50 62.66 93.25 266.00
##
## Problem Detected - Intervals are not positive. Data to help investigate
##
## Vector of Begins
## [1] "2016-09-28 23:19:00 UTC" "2016-10-01 16:02:00 UTC"
## [3] "2016-10-06 10:29:00 UTC" "2016-10-06 23:56:00 UTC"
## [5] "2016-10-12 12:16:00 UTC" "2016-10-12 17:36:00 UTC"
## [7] "2016-10-12 17:51:00 UTC" "2016-10-16 09:57:00 UTC"
## [9] "2016-10-26 14:48:00 UTC" "2016-10-30 07:56:00 UTC"
## [11] "2016-11-02 21:06:00 UTC"
##
## Vector of Ends
## [1] "2016-09-29 00:12:00 UTC" "2016-10-01 16:21:00 UTC"
## [3] "2016-10-06 11:04:00 UTC" "2016-10-07 00:25:00 UTC"
## [5] "2016-10-12 12:41:00 UTC" "2016-10-12 17:33:00 UTC"
## [7] "2016-10-12 18:05:00 UTC" "2016-10-16 11:20:00 UTC"
## [9] "2016-10-26 15:28:00 UTC" "2016-10-30 09:37:00 UTC"
## [11] "2016-11-02 21:21:00 UTC"
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8749 0
## TRUE 0 66
##
## Full matches between METAR observations and intervals
The thunderstorm data contains an interesting consistency error:
kord_2016 %>%
filter(monthint==10, day==12, lubridate::hour(dtime) %in% c(14, 15, 16, 17, 18, 19)) %>%
pull(origMETAR)
## [1] "KORD 121451Z 21013G21KT 10SM FEW050 SCT110 BKN140 OVC250 18/14 A2993 RMK AO2 SLP133 CB DSNT SW TCU DSNT S 60000 T01830144 56005"
## [2] "KORD 121551Z 23011G19KT 10SM FEW021 SCT050CB SCT120 BKN160 BKN250 20/15 A2992 RMK AO2 LTG DSNT SW SLP130 CB S MOV NE CB DSNT SW T02000150"
## [3] "KORD 121651Z 24012G18KT 8SM SCT015 OVC020 18/16 A2995 RMK AO2 LTG DSNT SE RAB1554E33 SLP139 P0001 T01780156"
## [4] "KORD 121751Z 23008G19KT 3SM R10L/5000VP6000FT -TSRA BR FEW014 BKN060CB OVC095 17/16 A2993 RMK AO2 LTG DSNT NE AND SE AND SW RAB1652 TSE33B36 SLP133 OCNL LTGCG E TS E MOV NE P0006 60007 T01720161 10206 20161 50003"
## [5] "KORD 121851Z 23008KT 6SM -RA BR BKN012 BKN095 OVC120 18/16 A2990 RMK AO2 TSE05 SLP123 P0002 T01780161"
## [6] "KORD 121951Z 26013KT 5SM -RA BR BKN015 BKN019 OVC095 18/16 A2990 RMK AO2 RAE1857B20 SLP124 P0000 T01780156"
The algorithm as currently set up is creating a -3 minute interval (from 1736 to 1733) and then also creating a start time of 1751 to match the active -TSRA and following TSE05.
In practice, it is more likely that the TSE33 should be deleted, leaving a short interval of TSB36 (1736) through TSE05 (1805) that overlaps the -TSRA observation at 1751.
Next steps are to work through this further consistency check and test the functions on other locales.
An updated function suggestBeginEndTimes() is created to better manage the situation above. Broadly speaking, there are three types of issues with precipitation data (assuming focus on a single precipitation type, such as rain):
There are two ancillary precipitation issues that can be observed:
The primary issues will be addressed first in an updated function:
# Helper function to print out the potential issue
printBeginEndIssues <- function(df, nColMax=3) {
# FUNCTION ARGUMENTS:
# df: the data frame or tibble containing the issues
# nColMax: the maximum number of columns allowed for precipitation begins and precipitation ends
# Variable names in the file are assumed to have a specific format as hard-coded in the function
cat("\nNeed Begin time\n")
df %>%
filter(needBegin) %>%
select(dtime, origMETAR) %>%
print()
cat("\nNeed End time\n")
df %>%
filter(needEnd) %>%
select(dtime, origMETAR) %>%
print()
cat("\nExtraneous Begin time\n")
df %>%
filter(overBegin) %>%
select(dtime, paste0("b", 1:nColMax)) %>%
print()
cat("\nExtraneous End time\n")
df %>%
filter(overEnd) %>%
select(dtime, paste0("e", 1:nColMax)) %>%
print()
cat("\nWrong amount of begins or ends\n")
df %>%
filter(tooManyBE) %>%
select(dtime, paste0("e", 1:nColMax), paste0("b", 1:nColMax)) %>%
print()
}
# Update suggestBeginEndTimes
suggestBeginEndTimes <- function(df,
regMatch,
nColMax=3,
printNACols=FALSE,
printIssues=TRUE
) {
# FUNCTION ARGUMENTS
# df: the data frame to be examined
# regMatch: the relevant regex extraction code
# nColMax: the maximum number of columns allowed for precipitation begins and precipitation ends
# printNACols: boolean, whether to print the NA column sums
# printIssues: boolean, whether to print out all of the issues
# Pull the data and check for the specified precipitation pattern and lags
# Needs to follow the \\d{6}Z that shows the time and precede the RMK that denotes the remarks
sugStates <- df %>%
select(dtime, origMETAR) %>%
mutate(curPrecip=str_detect(origMETAR, paste0("\\d{6}Z.*", regMatch, ".*RMK")),
lagPrecip=lag(curPrecip, 1)
)
# Use the analysis data to look for begins and ends flagged in the remarks
sugBE <- extractPrecipData(sugStates, pType=regMatch)
# Inner join the data by dtime
sugBEJoin <- sugBE %>%
select(dtime, precipData, chgPrecip=isPrecip, dateUTC, hourUTC)
sugStates <- sugStates %>%
inner_join(sugBEJoin, by="dtime")
# Get the beginning and end times data for the desired precipitation type
sugBegin <- getBeginEndTimeMatrix(sugStates, pState="B")
sugEnd <- getBeginEndTimeMatrix(sugStates, pState="E")
# Ensure that user-defined maximum number of columns is not exceeded
if (ncol(sugBegin) > nColMax | ncol(sugEnd) > nColMax) {
stop("Function argument set to allow only for 0 - ", nColMax, "columns, investigate and fix")
}
# For each iteration, create testBTn and testETn (use NA if no column)
for (ctr in 1:nColMax) {
# Create NA as the baseline
assign(paste0("testBT", ctr), NA)
assign(paste0("testET", ctr), NA)
# Assign begin if exists
if (ncol(sugBegin) >= ctr) {
bt <- getBeginEndTimeVector(sugBegin, sugStates, extractVar=paste0("V", ctr), extractSym="B")
assign(paste0("testBT", ctr), bt)
}
# Assign end if exists
if (ncol(sugEnd) >= ctr) {
et <- getBeginEndTimeVector(sugEnd, sugStates, extractVar=paste0("V", ctr), extractSym="E")
assign(paste0("testET", ctr), et)
}
}
# Integrate to a single file, allowing for nMaxCol applications of variables
# Create file sugExceptions as sugStates and initialize the columns for begins and ends
sugExceptions <- sugStates %>%
mutate(begins=0, ends=0)
# Loop through up to nColMax, adding the appropriate bn=testBTn and en=testETn
for (ctr in 1:nColMax) {
sugExceptions <- sugExceptions %>%
mutate(!!paste0("b", ctr) := get(paste0("testBT", ctr)),
!!paste0("e", ctr) := get(paste0("testET", ctr)),
begins=begins + ifelse(is.na(get(paste0("b", ctr))), 0, 1),
ends=ends + ifelse(is.na(get(paste0("e", ctr))), 0, 1)
)
}
# Create the final version by checking key criteria
# Add a criteria for b before e if previous; and e before b if not previous?
# Potential issues include:
# Current and lagged state are the same when begins != ends
# Current is no, lagged is yes, ends <= begins
# Current is yes, lagged is no, begins <= ends
sugExceptions <- sugExceptions %>%
mutate(issueA=(lagPrecip==curPrecip & begins!=ends), # no state change should have begins==ends
issueB=(lagPrecip & !curPrecip & ends <= begins), # change to 'no' should have ends>bsgins
issueC=(curPrecip & !lagPrecip & begins <= ends), # change to 'yse' should have begins>ends
issueD=abs(begins-ends) > 1, # state is a 1/0 so net change should be -1, 0, or 1
needBegin=issueC & begins==0,
needEnd=issueB & ends==0,
overBegin=(issueA & (begins > ends)) | (issueB & (ends > 0)),
overEnd=(issueA & (ends > begins)) | (issueC & (begins > 0)),
tooManyBE=issueD
)
# Report on the NA status by column
if (printNACols) {
colSums(is.na(sugExceptions)) %>%
print()
}
# Print out the issue if requested
if (printIssues) {
printBeginEndIssues(sugExceptions, nColMax=nColMax)
}
sugExceptions %>%
filter(issueA | issueB | issueC | issueD)
}
The function can then be run on the Chicago 2016 snow data and compared to previous:
# Run for the Chicago snow data
tmp <- suggestBeginEndTimes(kord_2016, regMatch="(?<!BL)SN")
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 4 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-02-10 00:51:00 KORD 100051Z 31012KT 4SM -SN FEW024 OVC040 M09/M13 A2993 ~
## 2 2016-03-24 21:51:00 KORD 242151Z 34014KT 1 3/4SM -RASN BR SCT010 OVC016 01/00~
## 3 2016-04-08 22:51:00 KORD 082251Z 31014G31KT 9SM -SN SCT045 BKN050 OVC080 02/M~
## 4 2016-12-24 01:51:00 KORD 240151Z 17011KT 4SM -SN BR SCT015 OVC039 01/M01 A299~
##
## Need End time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Extraneous Begin time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, b1 <chr>, b2 <chr>, b3 <lgl>
##
## Extraneous End time
## # A tibble: 1 x 4
## dtime e1 e2 e3
## <dttm> <chr> <chr> <lgl>
## 1 2016-04-09 04:51:00 2016-04-09 0446 <NA> NA
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
# Compare to old Chicago snow data
namesAll <- intersect(names(kord2016ExceptSN), names(tmp))
cat("\nVariables deleted:", setdiff(names(kord2016ExceptSN), namesAll), "\n")
##
## Variables deleted: etob btoe
cat("\nVariables created:", setdiff(names(tmp), namesAll), "\n")
##
## Variables created: issueA issueB issueC issueD tooManyBE
identical(kord2016ExceptSN[, namesAll], tmp[, namesAll])
## [1] TRUE
So, for the Chicago 2016 snow data, the new function is pulling the same information, just with somewhat different variables (keeping an issue log as variables may be helpful later).
The data are also re-run on Chicago thunderstorms to see if the negative interval is correctly pulled out:
# Run for the Chicago thunderstorm data
tmp <- suggestBeginEndTimes(kord_2016, regMatch="(?<!VC)TS")
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Need End time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Extraneous Begin time
## # A tibble: 2 x 4
## dtime b1 b2 b3
## <dttm> <chr> <chr> <lgl>
## 1 2016-08-28 08:51:00 2016-08-28 0825 <NA> NA
## 2 2016-09-21 17:51:00 2016-09-21 1744 <NA> NA
##
## Extraneous End time
## # A tibble: 4 x 4
## dtime e1 e2 e3
## <dttm> <chr> <chr> <lgl>
## 1 2016-08-28 09:51:00 2016-08-28 0910 <NA> NA
## 2 2016-09-21 18:51:00 2016-09-21 1759 <NA> NA
## 3 2016-10-06 21:51:00 2016-10-06 2131 <NA> NA
## 4 2016-10-12 17:51:00 2016-10-12 1733 <NA> NA
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
# Show the data
tmp %>%
select(dtime, curPrecip, lagPrecip, precipData, needBegin, needEnd, overBegin, overEnd, tooManyBE)
## # A tibble: 6 x 9
## dtime curPrecip lagPrecip precipData needBegin needEnd overBegin
## <dttm> <lgl> <lgl> <chr> <lgl> <lgl> <lgl>
## 1 2016-08-28 08:51:00 FALSE FALSE TSB25 FALSE FALSE TRUE
## 2 2016-08-28 09:51:00 FALSE FALSE TSE10 FALSE FALSE FALSE
## 3 2016-09-21 17:51:00 FALSE FALSE TSB44 FALSE FALSE TRUE
## 4 2016-09-21 18:51:00 FALSE FALSE TSE1759 FALSE FALSE FALSE
## 5 2016-10-06 21:51:00 FALSE FALSE TSE31 FALSE FALSE FALSE
## 6 2016-10-12 17:51:00 TRUE FALSE TSE33B36 FALSE FALSE FALSE
## # ... with 2 more variables: overEnd <lgl>, tooManyBE <lgl>
# Run through the conversion to begin and end times
tmp %>%
convertExceptionsToTimes()
## $endAdd
## character(0)
##
## $beginAdd
## character(0)
##
## $endExclude
## [1] "2016-08-28 0910" "2016-09-21 1759" "2016-10-06 2131" "2016-10-12 1733"
##
## $beginExclude
## [1] "2016-08-28 0825" "2016-09-21 1744"
It appears that this change will correct for the issue on 2016-10-12 with TSE33B36.
The process can then be run for the rain data in all of the files, with outputs saved to a single master list (initial exploration showed some issues with Minneapolis and New Orleans, so these are excluded to start):
# Create the list of files to run on - everything in 2016
fileNames <- grep(pattern="2016", names(cityNameMapper), value=TRUE)
fileNames <- fileNames[!(fileNames %in% c("kmsp_2016", "kmsy_2016"))]
print(fileNames)
## [1] "kdtw_2016" "kewr_2016" "kgrb_2016" "kgrr_2016" "kiah_2016" "kind_2016"
## [7] "klas_2016" "klnk_2016" "kmke_2016" "kmsn_2016" "kord_2016" "ksan_2016"
## [13] "ktvc_2016"
# Create a main storage list
listRA <- vector("list", length(fileNames))
names(listRA) <- fileNames
for (fName in fileNames) {
listRA[[fName]] <- combinedPrecipitation(get(fName), pType="(?<!FZ)RA", frameName=fName)
}
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 2 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-02-24 11:53:00 KDTW 241153Z 04014G19KT 5SM -RAPL SCT008 OVC012 01/M02 A2~
## 2 2016-04-07 23:53:00 KDTW 072353Z 30007KT 7SM -SNRA BKN022 OVC041 01/M01 A2962~
##
## Need End time
## # A tibble: 2 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-04-07 22:53:00 KDTW 072253Z 30013G19KT 1 1/2SM R03R/6000VP6000FT -SN BKN~
## 2 2016-08-16 14:53:00 KDTW 161453Z 22019G22 10SM BKN014 OVC030 24/22 A2993 RMK ~
##
## Extraneous Begin time
## # A tibble: 2 x 4
## dtime b1 b2 b3
## <dttm> <chr> <chr> <lgl>
## 1 2016-04-01 00:53:00 2016-04-01 0001 <NA> NA
## 2 2016-08-15 00:53:00 2016-08-14 2355 <NA> NA
##
## Extraneous End time
## # A tibble: 1 x 4
## dtime e1 e2 e3
## <dttm> <chr> <chr> <lgl>
## 1 2016-07-14 00:53:00 2016-07-14 0030 <NA> NA
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 15.00 35.00 86.58 89.50 932.00
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8258 0
## TRUE 0 560
##
## Full matches between METAR observations and intervals
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Need End time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Extraneous Begin time
## # A tibble: 1 x 4
## dtime b1 b2 b3
## <dttm> <chr> <chr> <lgl>
## 1 2016-02-24 19:51:00 2016-02-24 1858 <NA> NA
##
## Extraneous End time
## # A tibble: 1 x 4
## dtime e1 e2 e3
## <dttm> <chr> <chr> <lgl>
## 1 2016-11-20 08:51:00 2016-11-20 0758 <NA> NA
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.0 17.0 41.0 97.3 104.0 858.0
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8195 0
## TRUE 0 626
##
## Full matches between METAR observations and intervals
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Need End time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Extraneous Begin time
## # A tibble: 1 x 4
## dtime b1 b2 b3
## <dttm> <chr> <chr> <lgl>
## 1 2016-05-02 17:53:00 2016-05-02 1739 <NA> NA
##
## Extraneous End time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 12.00 28.00 61.16 63.00 857.00
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8378 0
## TRUE 0 425
##
## Full matches between METAR observations and intervals
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-06-26 14:53:00 KGRR 261453Z 25005KT 1 1/2SM RA BR SCT049 BKN090 OVC110 2~
##
## Need End time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-07-24 15:53:00 KGRR 241553Z 14004KT 10SM TS FEW048CB BKN080 OVC110 22/20~
##
## Extraneous Begin time
## # A tibble: 2 x 4
## dtime b1 b2 b3
## <dttm> <chr> <chr> <lgl>
## 1 2016-06-26 13:53:00 2016-06-26 1346 <NA> NA
## 2 2016-08-12 09:53:00 2016-08-12 0854 <NA> NA
##
## Extraneous End time
## # A tibble: 1 x 4
## dtime e1 e2 e3
## <dttm> <chr> <chr> <lgl>
## 1 2016-04-07 12:53:00 2016-04-07 1215 <NA> NA
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 17.00 40.50 94.02 111.75 901.00
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8208 0
## TRUE 0 604
##
## Full matches between METAR observations and intervals
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 2 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-08-14 00:53:00 KIAH 140053Z 00000KT 4SM TSRA BR SCT007 SCT020 BKN045CB O~
## 2 2016-08-14 22:53:00 KIAH 142253Z 09008KT 1/2SM R27/1600V2400FT -TSRA FEW006 S~
##
## Need End time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-12-04 10:53:00 KIAH 041053Z 32007KT 8SM OVC005 11/10 A2993 RMK AO2
##
## Extraneous Begin time
## # A tibble: 1 x 4
## dtime b1 b2 b3
## <dttm> <chr> <chr> <lgl>
## 1 2016-08-21 19:53:00 2016-08-21 1857 2016-08-21 1945 NA
##
## Extraneous End time
## # A tibble: 2 x 4
## dtime e1 e2 e3
## <dttm> <chr> <chr> <lgl>
## 1 2016-06-06 00:53:00 2016-06-06 0040 <NA> NA
## 2 2016-12-03 18:53:00 2016-12-03 1812 <NA> NA
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -18.00 13.00 28.00 59.71 65.00 845.00
##
## Problem Detected - Intervals are not positive. Data to help investigate
##
## Vector of Begins
## [1] "2016-08-19 18:36:00 UTC" "2016-08-20 18:04:00 UTC"
## [3] "2016-08-20 19:15:00 UTC" "2016-08-21 07:52:00 UTC"
## [5] "2016-08-21 14:55:00 UTC" "2016-08-21 19:45:00 UTC"
## [7] "2016-08-22 16:26:00 UTC" "2016-08-23 20:57:00 UTC"
## [9] "2016-08-24 20:02:00 UTC" "2016-08-25 18:12:00 UTC"
## [11] "2016-08-26 11:41:00 UTC"
##
## Vector of Ends
## [1] "2016-08-19 19:24:00 UTC" "2016-08-20 18:49:00 UTC"
## [3] "2016-08-20 19:49:00 UTC" "2016-08-21 08:10:00 UTC"
## [5] "2016-08-21 15:21:00 UTC" "2016-08-21 19:27:00 UTC"
## [7] "2016-08-22 20:26:00 UTC" "2016-08-23 21:09:00 UTC"
## [9] "2016-08-24 20:41:00 UTC" "2016-08-25 20:23:00 UTC"
## [11] "2016-08-26 12:26:00 UTC"
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8367 1
## TRUE 0 448
##
## Mismatch at time 2016-03-23 21:53 UTC
## [1] "KIAH 231953Z 17012G18KT 10SM BKN032 OVC050 23/17 A2986 RMK AO2 SLP110 T02280167"
## [2] "KIAH 232053Z 16017G24KT 10SM SCT030 OVC048 22/18 A2984 RMK AO2 RAB02E12 SLP104 SCT V BKN P0000 60000 T02220178 56021"
## [3] "KIAH 232153Z 17011KT 10SM SCT029 OVC050 22/18 A2983 RMK AO2 RAB2058E13 SLP102 P0000 T02170178"
## [4] "KIAH 232253Z 16010KT 10SM SCT030 OVC055 21/18 A2982 RMK AO2 SLP098 RAB2145E16 P0000 T02060183"
## [5] "KIAH 232353Z 16014G19KT 10SM BKN018 OVC050 20/18 A2982 RMK AO2 SLP096 60000 T02000183 10244 20200 56008"
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-07-18 17:54:00 KIND 181754Z 24013KT 10SM -TSRA BKN030CB OVC040 24/21 A30~
##
## Need End time
## # A tibble: 3 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-04-07 23:54:00 KIND 072354Z 30011KT 10SM BKN037 BKN060 04/M01 A2980 RMK ~
## 2 2016-05-08 19:54:00 KIND 081954Z 19007KT 10SM FEW010 OVC080 14/09 A2995 RMK A~
## 3 2016-08-07 19:54:00 KIND 071954Z 29007KT 10SM SCT060 SCT250 31/16 A2999 RMK A~
##
## Extraneous Begin time
## # A tibble: 1 x 4
## dtime b1 b2 b3
## <dttm> <chr> <chr> <lgl>
## 1 2016-09-05 21:54:00 2016-09-05 2101 <NA> NA
##
## Extraneous End time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 16.00 39.50 85.24 97.75 886.00
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8144 0
## TRUE 0 623
##
## Full matches between METAR observations and intervals
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-04-30 23:56:00 KLAS 302356Z 04026G31KT 6SM TSRA SCT035 BKN080CB OVC100 1~
##
## Need End time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Extraneous Begin time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, b1 <chr>, b2 <chr>, b3 <lgl>
##
## Extraneous End time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.0 16.0 29.0 52.1 57.5 502.0
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8710 0
## TRUE 0 108
##
## Full matches between METAR observations and intervals
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 2 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-08-29 22:54:00 KLNK 292254Z 35010KT 8SM -TSRA FEW070 BKN100 OVC120 23/20~
## 2 2016-09-16 06:54:00 KLNK 160654Z 36015G25KT 6SM RA BR FEW070 OVC110 16/15 A30~
##
## Need End time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-07-13 18:54:00 KLNK 131854Z 26005KT 10SM CLR 32/16 A2988 RMK AO2 SLP106 ~
##
## Extraneous Begin time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, b1 <chr>, b2 <chr>, b3 <lgl>
##
## Extraneous End time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 15.00 34.00 67.63 82.00 522.00
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8500 0
## TRUE 0 313
##
## Full matches between METAR observations and intervals
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 4 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-04-10 10:52:00 KMKE 101052Z 16017G26KT 8SM -PLRASN FEW046 BKN065 OVC075 ~
## 2 2016-06-10 03:52:00 KMKE 100352Z 20005KT 10SM -TSRA SCT070CB BKN090 18/16 A29~
## 3 2016-08-20 13:52:00 KMKE 201352Z 17010KT 10SM -RA FEW016 FEW035 BKN090 BKN150~
## 4 2016-10-07 02:52:00 KMKE 070252Z 18007KT 3SM VCTS -RA BR FEW030 BKN070 BKN095~
##
## Need End time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Extraneous Begin time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, b1 <chr>, b2 <chr>, b3 <lgl>
##
## Extraneous End time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.0 14.0 32.0 71.8 73.0 1335.0
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8346 0
## TRUE 0 462
##
## Full matches between METAR observations and intervals
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Need End time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-09-16 11:53:00 KMSN 161153Z 16011KT 10SM FEW090 SCT110 OVC150 18/16 A299~
##
## Extraneous Begin time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, b1 <chr>, b2 <chr>, b3 <lgl>
##
## Extraneous End time
## # A tibble: 1 x 4
## dtime e1 e2 e3
## <dttm> <chr> <chr> <lgl>
## 1 2016-09-07 12:53:00 2016-09-07 1252 <NA> NA
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 13.00 29.00 70.11 74.75 990.00
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8347 0
## TRUE 0 451
##
## Full matches between METAR observations and intervals
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 2 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-03-24 21:51:00 KORD 242151Z 34014KT 1 3/4SM -RASN BR SCT010 OVC016 01/00~
## 2 2016-08-29 19:51:00 KORD 291951Z 25009KT 1SM R10L/1200VP6000FT +TSRA BKN041CB~
##
## Need End time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-07-13 18:51:00 KORD 131851Z 22009G18KT 10SM FEW027 SCT047 BKN250 28/22 A~
##
## Extraneous Begin time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, b1 <chr>, b2 <chr>, b3 <lgl>
##
## Extraneous End time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 14.00 33.50 70.51 83.50 1243.00
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8349 0
## TRUE 0 466
##
## Full matches between METAR observations and intervals
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Need End time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Extraneous Begin time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, b1 <chr>, b2 <chr>, b3 <lgl>
##
## Extraneous End time
## # A tibble: 0 x 4
## # ... with 4 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.00 12.00 21.00 55.53 51.75 638.00
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8612 0
## TRUE 0 198
##
## Full matches between METAR observations and intervals
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Need End time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-09-28 19:53:00 KTVC 281953Z 36010KT 10SM FEW027 FEW100 SCT250 19/11 A300~
##
## Extraneous Begin time
## # A tibble: 1 x 4
## dtime b1 b2 b3
## <dttm> <chr> <chr> <lgl>
## 1 2016-06-06 21:53:00 2016-06-06 2117 <NA> NA
##
## Extraneous End time
## # A tibble: 3 x 4
## dtime e1 e2 e3
## <dttm> <chr> <chr> <lgl>
## 1 2016-03-28 10:53:00 2016-03-28 0957 <NA> NA
## 2 2016-06-06 20:53:00 2016-06-06 2049 <NA> NA
## 3 2016-12-02 16:53:00 2016-12-02 1629 <NA> NA
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 13.50 30.00 66.53 68.00 642.00
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8345 0
## TRUE 0 469
##
## Full matches between METAR observations and intervals
Initial outputs are encouraging:
Further diagnostics of the Houston situation include:
# August 21, 2016 issue
kiah_2016 %>%
filter(lubridate::date(dtime)=="2016-08-21", lubridate::hour(dtime) %in% c(18, 19, 20, 21)) %>%
pull(origMETAR)
## [1] "KIAH 211853Z 08005KT 10SM FEW018 FEW033 SCT090 BKN150 BKN180 BKN250 29/22 A3001 RMK AO2 SLP160 CB DSNT SW T02890222"
## [2] "KIAH 211953Z 13009KT 10SM SCT029 BKN049 BKN100 BKN150 BKN250 27/24 A2999 RMK AO2 RAB1857E27B45RAEMM SLP152 P0000 T02720244 $"
## [3] "KIAH 212053Z VRB05KT 10SM FEW017 SCT029 SCT050 BKN160 BKN250 30/24 A2998 RMK AO2 SLP150 60000 T03000244 58012"
## [4] "KIAH 212153Z 12009KT 10SM BKN023 BKN160 BKN250 29/24 A2997 RMK AO2 SLP148 BKN023 V SCT T02940239"
# March 23, 2016 issue
kiah_2016 %>%
filter(lubridate::date(dtime)=="2016-03-23", lubridate::hour(dtime) %in% c(20, 21, 22, 23)) %>%
pull(origMETAR)
## [1] "KIAH 232053Z 16017G24KT 10SM SCT030 OVC048 22/18 A2984 RMK AO2 RAB02E12 SLP104 SCT V BKN P0000 60000 T02220178 56021"
## [2] "KIAH 232153Z 17011KT 10SM SCT029 OVC050 22/18 A2983 RMK AO2 RAB2058E13 SLP102 P0000 T02170178"
## [3] "KIAH 232253Z 16010KT 10SM SCT030 OVC055 21/18 A2982 RMK AO2 SLP098 RAB2145E16 P0000 T02060183"
## [4] "KIAH 232353Z 16014G19KT 10SM BKN018 OVC050 20/18 A2982 RMK AO2 SLP096 60000 T02000183 10244 20200 56008"
The August 21 situation highlights an area where the guard logic needs improvement:
The March 23 situation highlights an error in the METAR:
The New Orleans data are run stand-alone since there were a few more issues:
tmpMSY <- combinedPrecipitation(kmsy_2016, pType="(?<!FZ)RA", frameName="kmsy_2016")
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 0 x 2
## # ... with 2 variables: dtime <dttm>, origMETAR <chr>
##
## Need End time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-03-27 16:53:00 KMSY 271653Z 12010KT 10SM BKN008 BKN030CB BKN060 OVC130 2~
##
## Extraneous Begin time
## # A tibble: 2 x 4
## dtime b1 b2 b3
## <dttm> <chr> <chr> <lgl>
## 1 2016-03-11 20:53:00 2016-03-11 2007 <NA> NA
## 2 2016-08-14 23:53:00 2016-08-14 2339 <NA> NA
##
## Extraneous End time
## # A tibble: 2 x 4
## dtime e1 e2 e3
## <dttm> <chr> <chr> <lgl>
## 1 2016-12-13 23:53:00 2016-12-13 2258 <NA> NA
## 2 2017-01-01 09:53:00 2017-01-01 0852 <NA> NA
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
## Warning in unclass(time1) - unclass(time2): longer object length is not a
## multiple of shorter object length
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -528234 15 30 -1142 66 738
##
## Problem Detected - Intervals are not positive. Data to help investigate
##
## Vector of Begins
## [1] "2017-01-01 13:06:00 UTC" "2017-01-01 17:04:00 UTC"
## [3] "2017-01-01 18:41:00 UTC" "2017-01-01 19:51:00 UTC"
## [5] "2017-01-01 20:15:00 UTC"
##
## Vector of Ends
## [1] "2017-01-01 13:48:00 UTC" "2017-01-01 17:23:00 UTC"
## [3] "2017-01-01 18:54:00 UTC" "2017-01-01 19:55:00 UTC"
## [5] "2017-01-01 20:49:00 UTC"
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8375 0
## TRUE 3 435
##
## Mismatch at time 2017-01-01 21:53 UTC
## [1] "KMSY 011953Z 20015G21KT 10SM +RA FEW022 SCT046 BKN065 OVC130 24/22 A2991 RMK AO2 LTG DSNT N RAE1854B51 SLP132 P0000 T02440222 $"
## [2] "KMSY 012053Z 28007KT 9SM OVC024 19/19 A2993 RMK AO2 WSHFT 1957 LTG DSNT NE AND W RAE1955B15E49 SLP140 CB DSNT N P0007 60013 T01940189 55004 $"
## [3] "KMSY 012153Z 29007KT 5SM -TSRA BR BKN028CB OVC035 19/19 A2995 RMK AO2 LTG DSNT W RAB2055 TSB31 SLP145 OCNL LTGICCG W-N TS W-N MOV NE P0004 T01890189 $"
## [4] "KMSY 012253Z 29007KT 5SM RA BR BKN032CB BKN041 OVC050 18/18 A2995 RMK AO2 LTG DSNT SE-W TSE06 SLP146 CB ALQDS MOV NE P0019 T01830183 $"
## [5] "KMSY 012353Z 22003KT 3SM RA BR FEW034 SCT041 OVC055 18/18 A2998 RMK AO2 LTG DSNT E SLP155 CB ALQDS MOV NE P0028 60064 T01780178 10244 20178 53016 $"
##
## Mismatch at time 2017-01-01 22:53 UTC
## [1] "KMSY 012053Z 28007KT 9SM OVC024 19/19 A2993 RMK AO2 WSHFT 1957 LTG DSNT NE AND W RAE1955B15E49 SLP140 CB DSNT N P0007 60013 T01940189 55004 $"
## [2] "KMSY 012153Z 29007KT 5SM -TSRA BR BKN028CB OVC035 19/19 A2995 RMK AO2 LTG DSNT W RAB2055 TSB31 SLP145 OCNL LTGICCG W-N TS W-N MOV NE P0004 T01890189 $"
## [3] "KMSY 012253Z 29007KT 5SM RA BR BKN032CB BKN041 OVC050 18/18 A2995 RMK AO2 LTG DSNT SE-W TSE06 SLP146 CB ALQDS MOV NE P0019 T01830183 $"
## [4] "KMSY 012353Z 22003KT 3SM RA BR FEW034 SCT041 OVC055 18/18 A2998 RMK AO2 LTG DSNT E SLP155 CB ALQDS MOV NE P0028 60064 T01780178 10244 20178 53016 $"
##
## Mismatch at time 2017-01-01 23:53 UTC
## [1] "KMSY 012153Z 29007KT 5SM -TSRA BR BKN028CB OVC035 19/19 A2995 RMK AO2 LTG DSNT W RAB2055 TSB31 SLP145 OCNL LTGICCG W-N TS W-N MOV NE P0004 T01890189 $"
## [2] "KMSY 012253Z 29007KT 5SM RA BR BKN032CB BKN041 OVC050 18/18 A2995 RMK AO2 LTG DSNT SE-W TSE06 SLP146 CB ALQDS MOV NE P0019 T01830183 $"
## [3] "KMSY 012353Z 22003KT 3SM RA BR FEW034 SCT041 OVC055 18/18 A2998 RMK AO2 LTG DSNT E SLP155 CB ALQDS MOV NE P0028 60064 T01780178 10244 20178 53016 $"
New Orleans is interesting in that the final hours of the final day of the pulled METAR are all raining. This leads (correctly) to no final RAE in the METAR, and the logic should be updated to place an artificial RAE in the last minute of the file. Without that, there is recycling, and the interval becomes -365 days. There is a similar need to add logic in case the beginning state is RAB.
The Minneapolis data are run stand-alone since there are many issues:
tmpMSP <- combinedPrecipitation(kmsp_2016, pType="(?<!FZ)RA", frameName="kmsp_2016", maxConsistencyPrint = 5)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## Need Begin time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-09-06 07:53:00 KMSP 060753Z 22007G17KT 6SM -TSRA FEW023 BKN049CB OVC095 ~
##
## Need End time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-11-22 13:53:00 KMSP 221353Z 13018KT 10SM -PL OVC055 01/M06 A3015 RMK AO2~
##
## Extraneous Begin time
## # A tibble: 4 x 4
## dtime b1 b2 b3
## <dttm> <chr> <chr> <lgl>
## 1 2016-05-29 19:53:00 2016-05-29 1931 <NA> NA
## 2 2016-06-20 04:53:00 2016-06-20 0411 <NA> NA
## 3 2016-09-06 08:53:00 2016-09-06 0744 2016-09-06 0817 NA
## 4 2016-11-29 07:53:00 2016-11-29 0718 <NA> NA
##
## Extraneous End time
## # A tibble: 2 x 4
## dtime e1 e2 e3
## <dttm> <chr> <chr> <lgl>
## 1 2016-04-05 15:53:00 2016-04-05 1522 <NA> NA
## 2 2016-10-18 04:53:00 2016-10-18 0435 <NA> NA
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
## Warning in unclass(time1) - unclass(time2): longer object length is not a
## multiple of shorter object length
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -476532.0 18.0 67.5 -818.0 188.0 28458.0
##
## Problem Detected - Intervals are not positive. Data to help investigate
##
## Vector of Begins
## [1] "2016-12-05 18:06:00 UTC" "2016-12-05 18:50:00 UTC"
## [3] "2016-12-06 00:35:00 UTC" "2016-12-23 17:02:00 UTC"
## [5] "2016-12-25 17:59:00 UTC"
##
## Vector of Ends
## [1] "2016-12-05 18:59:00 UTC" "2016-12-06 00:50:00 UTC"
## [3] "2016-12-23 17:45:00 UTC" "2016-12-25 22:35:00 UTC"
## [5] "2016-12-26 05:07:00 UTC"
##
##
## Potential problem Detected - very long. Data to help investigate
##
## Positions with problems are: 260 266 268 269 270 271 278 280 288 289 294 298 303 304 312 314 319 321 327 328 350 353 354
## Vector of Begins
## [1] "2016-09-07 01:55:00 UTC" "2016-09-07 02:22:00 UTC"
## [3] "2016-09-07 12:44:00 UTC" "2016-09-07 14:12:00 UTC"
## [5] "2016-09-07 15:08:00 UTC" "2016-09-07 17:02:00 UTC"
## [7] "2016-09-09 00:52:00 UTC" "2016-09-09 05:36:00 UTC"
## [9] "2016-09-09 07:51:00 UTC" "2016-09-09 11:49:00 UTC"
## [11] "2016-09-09 15:33:00 UTC"
##
## Vector of Ends
## [1] "2016-09-07 02:57:00 UTC" "2016-09-07 13:19:00 UTC"
## [3] "2016-09-07 14:57:00 UTC" "2016-09-07 15:44:00 UTC"
## [5] "2016-09-07 17:11:00 UTC" "2016-09-09 01:01:00 UTC"
## [7] "2016-09-09 05:53:00 UTC" "2016-09-09 09:31:00 UTC"
## [9] "2016-09-09 12:39:00 UTC" "2016-09-09 15:57:00 UTC"
## [11] "2016-09-10 06:10:00 UTC"
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 5875 2515
## TRUE 0 427
##
## Mismatch at time 2016-09-06 08:53 UTC
## [1] "KMSP 060653Z 21010G17KT 10SM BKN050 BKN060 OVC090 27/22 A2983 RMK AO2 SLP097 T02720217"
## [2] "KMSP 060753Z 22007G17KT 6SM -TSRA FEW023 BKN049CB OVC095 26/22 A2982 RMK AO2 PK WND 22027/0742 LTG DSNT NE AND S SLP092 TSB35 OCNL LTGICCG OHD TS OHD MOV NE P0015 T02560222"
## [3] "KMSP 060853Z 24003KT 8SM TS BKN049CB BKN060 OVC080 25/22 A2981 RMK AO2 LTG DSNT ALQDS SLP088 RAB0744E01B17 TSB0735E12B22 OCNL LTGICCG OHD TS OHD MOV NE P0001 60016 T02500222 58008"
## [4] "KMSP 060953Z 25007KT 4SM -TSRA BR SCT027CB BKN041 OVC080 24/22 A2981 RMK AO2 LTG DSNT ALQDS RAB34 TSE01B13E28B38 SLP089 FRQ LTGICCG OHD TS OHD MOV NE P0020 T02390222"
## [5] "KMSP 061053Z 18003KT 5SM -TSRA BR FEW029 BKN045CB OVC065 24/22 A2981 RMK AO2 LTG DSNT ALQDS RAE01B14E28B46 TSE02B12E36B41 SLP088 OCNL LTGICCG N-E VCTS N-E MOV NE P0005 T02440222"
##
## Mismatch at time 2016-09-06 14:53 UTC
## [1] "KMSP 061253Z 00000KT 9SM -RA FEW007 SCT015 BKN100 OVC150 22/21 A2989 RMK AO2 LTG DSNT SE RAE12B37 SLP116 P0004 T02220206"
## [2] "KMSP 061353Z 21007KT 8SM -RA FEW010 OVC090 22/21 A2996 RMK AO2 RAE00B21 SLP140 P0007 T02220206"
## [3] "KMSP 061453Z 08007KT 10SM FEW005 FEW110 BKN150 BKN250 22/21 A2989 RMK AO2 LTG DSNT W RAE47 SLP116 P0006 60017 T02220206 58006"
## [4] "KMSP 061553Z VRB05KT 10SM -RA SCT009 SCT100 BKN120 BKN150 22/21 A2992 RMK AO2 LTG DSNT SW RAB13 SLP125 P0003 T02220206"
## [5] "KMSP 061653Z 15004KT 10SM FEW110 BKN150 BKN250 22/19 A2992 RMK AO2 LTG DSNT E RAE40 SLP127 P0002 T02220194"
##
## Mismatch at time 2016-09-06 16:53 UTC
## [1] "KMSP 061453Z 08007KT 10SM FEW005 FEW110 BKN150 BKN250 22/21 A2989 RMK AO2 LTG DSNT W RAE47 SLP116 P0006 60017 T02220206 58006"
## [2] "KMSP 061553Z VRB05KT 10SM -RA SCT009 SCT100 BKN120 BKN150 22/21 A2992 RMK AO2 LTG DSNT SW RAB13 SLP125 P0003 T02220206"
## [3] "KMSP 061653Z 15004KT 10SM FEW110 BKN150 BKN250 22/19 A2992 RMK AO2 LTG DSNT E RAE40 SLP127 P0002 T02220194"
## [4] "KMSP 061753Z 20004KT 10SM FEW010 FEW150 BKN180 BKN250 22/19 A2994 RMK AO2 LTG DSNT SE AND S SLP132 60022 T02220194 10228 20217 51016"
## [5] "KMSP 061853Z 21004KT 10SM SCT011 BKN033 BKN150 BKN250 24/20 A2995 RMK AO2 SLP137 T02390200"
##
## Mismatch at time 2016-09-06 17:53 UTC
## [1] "KMSP 061553Z VRB05KT 10SM -RA SCT009 SCT100 BKN120 BKN150 22/21 A2992 RMK AO2 LTG DSNT SW RAB13 SLP125 P0003 T02220206"
## [2] "KMSP 061653Z 15004KT 10SM FEW110 BKN150 BKN250 22/19 A2992 RMK AO2 LTG DSNT E RAE40 SLP127 P0002 T02220194"
## [3] "KMSP 061753Z 20004KT 10SM FEW010 FEW150 BKN180 BKN250 22/19 A2994 RMK AO2 LTG DSNT SE AND S SLP132 60022 T02220194 10228 20217 51016"
## [4] "KMSP 061853Z 21004KT 10SM SCT011 BKN033 BKN150 BKN250 24/20 A2995 RMK AO2 SLP137 T02390200"
## [5] "KMSP 061953Z 23005KT 10SM FEW011 BKN036 BKN070 BKN250 24/20 A2996 RMK AO2 RAB20E49 SLP140 P0000 T02440200"
##
## Mismatch at time 2016-09-06 18:53 UTC
## [1] "KMSP 061653Z 15004KT 10SM FEW110 BKN150 BKN250 22/19 A2992 RMK AO2 LTG DSNT E RAE40 SLP127 P0002 T02220194"
## [2] "KMSP 061753Z 20004KT 10SM FEW010 FEW150 BKN180 BKN250 22/19 A2994 RMK AO2 LTG DSNT SE AND S SLP132 60022 T02220194 10228 20217 51016"
## [3] "KMSP 061853Z 21004KT 10SM SCT011 BKN033 BKN150 BKN250 24/20 A2995 RMK AO2 SLP137 T02390200"
## [4] "KMSP 061953Z 23005KT 10SM FEW011 BKN036 BKN070 BKN250 24/20 A2996 RMK AO2 RAB20E49 SLP140 P0000 T02440200"
## [5] "KMSP 062053Z 19003KT 10SM FEW018 BKN041 BKN130 BKN250 25/19 A2994 RMK AO2 LTG DSNT SE AND S SLP134 60000 T02500189 50002"
testMSP <- tibble::tibble(beginTime=tmpMSP$testAllBegins,
endTime=c(tmpMSP$testAllEnds, NA), # make the same length as begins
ints=tmpMSP$testIntervals
)
testMSP %>%
filter(ints >= 2000 | lubridate::date(beginTime) == "2016-09-06") %>%
as.data.frame()
## beginTime endTime ints
## 1 2016-09-06 07:53:00 2016-09-06 08:01:00 8 mins
## 2 2016-09-06 08:17:00 2016-09-06 10:01:00 104 mins
## 3 2016-09-06 09:34:00 2016-09-06 10:28:00 54 mins
## 4 2016-09-06 10:14:00 2016-09-06 11:01:00 47 mins
## 5 2016-09-06 10:46:00 2016-09-06 12:12:00 86 mins
## 6 2016-09-06 11:16:00 2016-09-06 13:00:00 104 mins
## 7 2016-09-06 12:37:00 2016-09-06 14:47:00 130 mins
## 8 2016-09-06 13:21:00 2016-09-06 16:40:00 199 mins
## 9 2016-09-06 15:13:00 2016-09-06 19:49:00 276 mins
## 10 2016-09-06 19:20:00 2016-09-07 01:25:00 365 mins
## 11 2016-09-10 05:46:00 2016-09-13 23:51:00 5405 mins
## 12 2016-09-14 01:09:00 2016-09-16 02:19:00 2950 mins
## 13 2016-09-17 01:07:00 2016-09-19 16:05:00 3778 mins
## 14 2016-09-19 14:58:00 2016-09-21 05:04:00 2286 mins
## 15 2016-09-22 11:01:00 2016-09-23 23:46:00 2205 mins
## 16 2016-09-25 22:14:00 2016-09-28 00:06:00 2992 mins
## 17 2016-09-27 21:33:00 2016-10-04 15:07:00 9694 mins
## 18 2016-10-05 01:09:00 2016-10-06 22:52:00 2743 mins
## 19 2016-10-07 06:46:00 2016-10-12 04:37:00 7071 mins
## 20 2016-10-12 10:59:00 2016-10-15 22:57:00 5038 mins
## 21 2016-10-18 05:48:00 2016-10-26 13:17:00 11969 mins
## 22 2016-10-26 16:18:00 2016-10-29 17:11:00 4373 mins
## 23 2016-10-29 22:48:00 2016-11-18 17:06:00 28458 mins
## 24 2016-11-18 18:13:00 2016-11-22 13:53:00 5500 mins
## 25 2016-11-22 22:53:00 2016-11-25 00:21:00 2968 mins
## 26 2016-11-24 22:49:00 2016-11-28 05:16:00 4707 mins
## 27 2016-12-01 07:20:00 2016-12-05 18:28:00 6428 mins
## 28 2016-12-06 00:35:00 2016-12-23 17:45:00 25510 mins
## 29 2016-12-23 17:02:00 2016-12-25 22:35:00 3213 mins
The Minneapolis data appear to first become misaligned during a September 6 rain event.
Next steps are to investigate and fix the following:
The functions that need updating are suggestBeginEndTimes and convertExceptionsToTimes and getBeginEndTimeMatrix:
# Updated function to account for:
# 1. Remove any RAB/RAE observations that have dddd that 'belongs' to a different hourly METAR
# 2. Check whether overBegin or overEnd have 2+ causes, and delete all that are necessary
# 3. Upgrade logic to manage a starting state of precipitation, or an ending state of precipitation
# 4. Select the correct RAB/RAE to discard in cases like RABnnEnnBnn where net change should be 0
# Helper function to extract the beginning and ending times using str_extract_all
# Updated to check that four-digit times are within the proper hour
getBeginEndTimeMatrix <- function(file,
pullVar="precipData",
pState="B",
timeVar=NULL
) {
# FUNCTION ARGUMENTS:
# file: file containing data
# pullVar: the variable of interest in file
# pState: the precipitation state of interest - B (begin) or E (end)
# timeVar: the datetime variable in file which can be used to check consistency (NULL means no checks)
# The basic extraction process
mtxBET <- file %>%
pull(pullVar) %>%
str_extract_all(paste0(pState, "\\d+"), simplify=TRUE) %>%
as.data.frame(stringsAsFactors=FALSE) %>%
tibble::as_tibble()
# If time variable has been passed, check for consistency of extracted times
if (!is.null(timeVar)) {
chk <- mtxBET %>%
mutate(time=pull(file, timeVar)) %>%
pivot_longer(-time, names_to="column", values_to="value") %>%
mutate(convVal=str_replace(value, pattern="B", replacement=""),
keyMin=lubridate::minute(time),
delta=ifelse(str_length(convVal)==4, as.integer(str_sub(convVal, 3, -1))-keyMin, NA)
)
issues <- chk %>%
filter(delta <= 0)
if (nrow(issues) > 0) {
# Flag the issue
cat("\nMETAR has begin or end time belonging to a different record - will be deleted\n")
print(issues)
# Delete the issue
chk <- chk %>%
filter(is.na(delta) | delta > 0)
# Fix the V1-Vn due to deletions
chk <- chk %>%
group_by(time) %>%
mutate(column=paste0("V", row_number())) %>%
ungroup()
# Update mtxBET
mtxBET <- chk %>%
select(time, column, value) %>%
pivot_wider(names_from=column, values_from=value) %>%
select(-time)
}
}
# Return the matrix (converted to data frame)
mtxBET
}
# Update to pull any cases of 2+ overBegin or 2+ overEnd
convertExceptionsToTimes <- function(excFile, maxCol=3) {
# Basic beginAdd (just add the METAR time)
beginAdd <- excFile %>%
filter(needBegin) %>%
pull(dtime) %>%
timeFormatter()
# Basic endAdd (just add the METAR time)
endAdd <- excFile %>%
filter(needEnd) %>%
pull(dtime) %>%
timeFormatter()
# Basic endExclude (just delete the earliest end time - maybe add a guard check later)
endExclude <- excFile %>%
filter(overEnd) %>%
select_at(vars(all_of(c("dtime", "nOverEnd", paste0("e", 1:maxCol))))) %>%
pivot_longer(-c(dtime, nOverEnd), names_to="column", values_to="time") %>%
filter(as.integer(str_replace(column, pattern="e", replacement="")) <= nOverEnd) %>%
pull(time)
# Basic beginExclude (just delete the earliest begin time - maybe add a guard check later)
beginExclude <- excFile %>%
filter(overBegin) %>%
select_at(vars(all_of(c("dtime", "nOverBegin", paste0("b", 1:maxCol))))) %>%
pivot_longer(-c(dtime, nOverBegin), names_to="column", values_to="time") %>%
filter(as.integer(str_replace(column, pattern="b", replacement="")) <= nOverBegin) %>%
pull(time)
list(endAdd=endAdd, beginAdd=beginAdd, endExclude=endExclude, beginExclude=beginExclude)
}
# Updated to pull out the number of overBegin and overEnd
suggestBeginEndTimes <- function(df,
regMatch,
nColMax=3,
printNACols=FALSE,
printIssues=TRUE
) {
# FUNCTION ARGUMENTS
# df: the data frame to be examined
# regMatch: the relevant regex extraction code
# nColMax: the maximum number of columns allowed for precipitation begins and precipitation ends
# printNACols: boolean, whether to print the NA column sums
# printIssues: boolean, whether to print out all of the issues
# Pull the data and check for the specified precipitation pattern and lags
# Needs to follow the \\d{6}Z that shows the time and precede the RMK that denotes the remarks
sugStates <- df %>%
select(dtime, origMETAR) %>%
mutate(curPrecip=str_detect(origMETAR, paste0("\\d{6}Z.*", regMatch, ".*RMK")),
lagPrecip=lag(curPrecip, 1)
)
# Use the analysis data to look for begins and ends flagged in the remarks
sugBE <- extractPrecipData(sugStates, pType=regMatch)
# Inner join the data by dtime
sugBEJoin <- sugBE %>%
select(dtime, precipData, chgPrecip=isPrecip, dateUTC, hourUTC)
sugStates <- sugStates %>%
inner_join(sugBEJoin, by="dtime")
# Get the beginning and end times data for the desired precipitation type
sugBegin <- getBeginEndTimeMatrix(sugStates, pState="B", timeVar="dtime")
sugEnd <- getBeginEndTimeMatrix(sugStates, pState="E", timeVar="dtime")
# Ensure that user-defined maximum number of columns is not exceeded
if (ncol(sugBegin) > nColMax | ncol(sugEnd) > nColMax) {
stop("Function argument set to allow only for 0 - ", nColMax, "columns, investigate and fix")
}
# For each iteration, create testBTn and testETn (use NA if no column)
for (ctr in 1:nColMax) {
# Create NA as the baseline
assign(paste0("testBT", ctr), NA)
assign(paste0("testET", ctr), NA)
# Assign begin if exists
if (ncol(sugBegin) >= ctr) {
bt <- getBeginEndTimeVector(sugBegin, sugStates, extractVar=paste0("V", ctr), extractSym="B")
assign(paste0("testBT", ctr), bt)
}
# Assign end if exists
if (ncol(sugEnd) >= ctr) {
et <- getBeginEndTimeVector(sugEnd, sugStates, extractVar=paste0("V", ctr), extractSym="E")
assign(paste0("testET", ctr), et)
}
}
# Integrate to a single file, allowing for nMaxCol applications of variables
# Create file sugExceptions as sugStates and initialize the columns for begins and ends
sugExceptions <- sugStates %>%
mutate(begins=0, ends=0)
# Loop through up to nColMax, adding the appropriate bn=testBTn and en=testETn
for (ctr in 1:nColMax) {
sugExceptions <- sugExceptions %>%
mutate(!!paste0("b", ctr) := get(paste0("testBT", ctr)),
!!paste0("e", ctr) := get(paste0("testET", ctr)),
begins=begins + ifelse(is.na(get(paste0("b", ctr))), 0, 1),
ends=ends + ifelse(is.na(get(paste0("e", ctr))), 0, 1)
)
}
# Create the final version by checking key criteria
# Add a criteria for b before e if previous; and e before b if not previous?
# Potential issues include:
# Current and lagged state are the same when begins != ends
# Current is no, lagged is yes, ends <= begins
# Current is yes, lagged is no, begins <= ends
sugExceptions <- sugExceptions %>%
mutate(issueA=(lagPrecip==curPrecip & begins!=ends), # no state change should have begins==ends
issueB=(lagPrecip & !curPrecip & ends <= begins), # change to 'no' should have ends>bsgins
issueC=(curPrecip & !lagPrecip & begins <= ends), # change to 'yse' should have begins>ends
issueD=abs(begins-ends) > 1, # state is a 1/0 so net change should be -1, 0, or 1
netBeginEndShouldBe=(curPrecip - lagPrecip),
netBeginEndIs=(begins - ends),
needBegin=issueC & begins==0,
needEnd=issueB & ends==0,
overBegin=(issueA & (begins > ends)) | (issueB & (ends > 0)),
overEnd=(issueA & (ends > begins)) | (issueC & (begins > 0)),
nOverBegin=pmax(0, netBeginEndIs-netBeginEndShouldBe-needEnd),
nOverEnd=pmax(0, netBeginEndShouldBe-netBeginEndIs-needBegin),
tooManyBE=issueD
)
# Report on the NA status by column
if (printNACols) {
colSums(is.na(sugExceptions)) %>%
print()
}
# Print out the issue if requested
if (printIssues) {
printBeginEndIssues(sugExceptions, nColMax=nColMax)
}
sugExceptions %>%
filter(issueA | issueB | issueC | issueD)
}
With the logic updated to correct for times in the wrong METAR as well as 2+ occurrences of overBegin or overEnd, the process is re-run for Minneapolis:
tmpMSP2 <- combinedPrecipitation(kmsp_2016,
pType="(?<!FZ)RA",
frameName="kmsp_2016",
maxConsistencyPrint = 5
)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
##
## METAR has begin or end time belonging to a different record - will be deleted
## # A tibble: 1 x 6
## time column value convVal keyMin delta
## <dttm> <chr> <chr> <chr> <int> <int>
## 1 2016-09-06 08:53:00 V1 B0744 0744 53 -9
##
## Need Begin time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-09-06 07:53:00 KMSP 060753Z 22007G17KT 6SM -TSRA FEW023 BKN049CB OVC095 ~
##
## Need End time
## # A tibble: 1 x 2
## dtime origMETAR
## <dttm> <chr>
## 1 2016-11-22 13:53:00 KMSP 221353Z 13018KT 10SM -PL OVC055 01/M06 A3015 RMK AO2~
##
## Extraneous Begin time
## # A tibble: 4 x 4
## dtime b1 b2 b3
## <dttm> <chr> <chr> <lgl>
## 1 2016-05-29 19:53:00 2016-05-29 1931 <NA> NA
## 2 2016-06-20 04:53:00 2016-06-20 0411 <NA> NA
## 3 2016-09-06 08:53:00 2016-09-06 0817 <NA> NA
## 4 2016-11-29 07:53:00 2016-11-29 0718 <NA> NA
##
## Extraneous End time
## # A tibble: 2 x 4
## dtime e1 e2 e3
## <dttm> <chr> <chr> <lgl>
## 1 2016-04-05 15:53:00 2016-04-05 1522 <NA> NA
## 2 2016-10-18 04:53:00 2016-10-18 0435 <NA> NA
##
## Wrong amount of begins or ends
## # A tibble: 0 x 7
## # ... with 7 variables: dtime <dttm>, e1 <chr>, e2 <chr>, e3 <lgl>, b1 <chr>,
## # b2 <chr>, b3 <lgl>
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
## Warning in unclass(time1) - unclass(time2): longer object length is not a
## multiple of shorter object length
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -476532.0 18.0 67.5 -817.9 188.0 28458.0
##
## Problem Detected - Intervals are not positive. Data to help investigate
##
## Vector of Begins
## [1] "2016-12-05 18:06:00 UTC" "2016-12-05 18:50:00 UTC"
## [3] "2016-12-06 00:35:00 UTC" "2016-12-23 17:02:00 UTC"
## [5] "2016-12-25 17:59:00 UTC"
##
## Vector of Ends
## [1] "2016-12-05 18:59:00 UTC" "2016-12-06 00:50:00 UTC"
## [3] "2016-12-23 17:45:00 UTC" "2016-12-25 22:35:00 UTC"
## [5] "2016-12-26 05:07:00 UTC"
##
##
## Potential problem Detected - very long. Data to help investigate
##
## Positions with problems are: 260 266 268 269 270 271 278 280 288 289 294 298 303 304 312 314 319 321 327 328 350 353 354
## Vector of Begins
## [1] "2016-09-07 01:55:00 UTC" "2016-09-07 02:22:00 UTC"
## [3] "2016-09-07 12:44:00 UTC" "2016-09-07 14:12:00 UTC"
## [5] "2016-09-07 15:08:00 UTC" "2016-09-07 17:02:00 UTC"
## [7] "2016-09-09 00:52:00 UTC" "2016-09-09 05:36:00 UTC"
## [9] "2016-09-09 07:51:00 UTC" "2016-09-09 11:49:00 UTC"
## [11] "2016-09-09 15:33:00 UTC"
##
## Vector of Ends
## [1] "2016-09-07 02:57:00 UTC" "2016-09-07 13:19:00 UTC"
## [3] "2016-09-07 14:57:00 UTC" "2016-09-07 15:44:00 UTC"
## [5] "2016-09-07 17:11:00 UTC" "2016-09-09 01:01:00 UTC"
## [7] "2016-09-09 05:53:00 UTC" "2016-09-09 09:31:00 UTC"
## [9] "2016-09-09 12:39:00 UTC" "2016-09-09 15:57:00 UTC"
## [11] "2016-09-10 06:10:00 UTC"
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 5875 2515
## TRUE 0 427
##
## Mismatch at time 2016-09-06 08:53 UTC
## [1] "KMSP 060653Z 21010G17KT 10SM BKN050 BKN060 OVC090 27/22 A2983 RMK AO2 SLP097 T02720217"
## [2] "KMSP 060753Z 22007G17KT 6SM -TSRA FEW023 BKN049CB OVC095 26/22 A2982 RMK AO2 PK WND 22027/0742 LTG DSNT NE AND S SLP092 TSB35 OCNL LTGICCG OHD TS OHD MOV NE P0015 T02560222"
## [3] "KMSP 060853Z 24003KT 8SM TS BKN049CB BKN060 OVC080 25/22 A2981 RMK AO2 LTG DSNT ALQDS SLP088 RAB0744E01B17 TSB0735E12B22 OCNL LTGICCG OHD TS OHD MOV NE P0001 60016 T02500222 58008"
## [4] "KMSP 060953Z 25007KT 4SM -TSRA BR SCT027CB BKN041 OVC080 24/22 A2981 RMK AO2 LTG DSNT ALQDS RAB34 TSE01B13E28B38 SLP089 FRQ LTGICCG OHD TS OHD MOV NE P0020 T02390222"
## [5] "KMSP 061053Z 18003KT 5SM -TSRA BR FEW029 BKN045CB OVC065 24/22 A2981 RMK AO2 LTG DSNT ALQDS RAE01B14E28B46 TSE02B12E36B41 SLP088 OCNL LTGICCG N-E VCTS N-E MOV NE P0005 T02440222"
##
## Mismatch at time 2016-09-06 14:53 UTC
## [1] "KMSP 061253Z 00000KT 9SM -RA FEW007 SCT015 BKN100 OVC150 22/21 A2989 RMK AO2 LTG DSNT SE RAE12B37 SLP116 P0004 T02220206"
## [2] "KMSP 061353Z 21007KT 8SM -RA FEW010 OVC090 22/21 A2996 RMK AO2 RAE00B21 SLP140 P0007 T02220206"
## [3] "KMSP 061453Z 08007KT 10SM FEW005 FEW110 BKN150 BKN250 22/21 A2989 RMK AO2 LTG DSNT W RAE47 SLP116 P0006 60017 T02220206 58006"
## [4] "KMSP 061553Z VRB05KT 10SM -RA SCT009 SCT100 BKN120 BKN150 22/21 A2992 RMK AO2 LTG DSNT SW RAB13 SLP125 P0003 T02220206"
## [5] "KMSP 061653Z 15004KT 10SM FEW110 BKN150 BKN250 22/19 A2992 RMK AO2 LTG DSNT E RAE40 SLP127 P0002 T02220194"
##
## Mismatch at time 2016-09-06 16:53 UTC
## [1] "KMSP 061453Z 08007KT 10SM FEW005 FEW110 BKN150 BKN250 22/21 A2989 RMK AO2 LTG DSNT W RAE47 SLP116 P0006 60017 T02220206 58006"
## [2] "KMSP 061553Z VRB05KT 10SM -RA SCT009 SCT100 BKN120 BKN150 22/21 A2992 RMK AO2 LTG DSNT SW RAB13 SLP125 P0003 T02220206"
## [3] "KMSP 061653Z 15004KT 10SM FEW110 BKN150 BKN250 22/19 A2992 RMK AO2 LTG DSNT E RAE40 SLP127 P0002 T02220194"
## [4] "KMSP 061753Z 20004KT 10SM FEW010 FEW150 BKN180 BKN250 22/19 A2994 RMK AO2 LTG DSNT SE AND S SLP132 60022 T02220194 10228 20217 51016"
## [5] "KMSP 061853Z 21004KT 10SM SCT011 BKN033 BKN150 BKN250 24/20 A2995 RMK AO2 SLP137 T02390200"
##
## Mismatch at time 2016-09-06 17:53 UTC
## [1] "KMSP 061553Z VRB05KT 10SM -RA SCT009 SCT100 BKN120 BKN150 22/21 A2992 RMK AO2 LTG DSNT SW RAB13 SLP125 P0003 T02220206"
## [2] "KMSP 061653Z 15004KT 10SM FEW110 BKN150 BKN250 22/19 A2992 RMK AO2 LTG DSNT E RAE40 SLP127 P0002 T02220194"
## [3] "KMSP 061753Z 20004KT 10SM FEW010 FEW150 BKN180 BKN250 22/19 A2994 RMK AO2 LTG DSNT SE AND S SLP132 60022 T02220194 10228 20217 51016"
## [4] "KMSP 061853Z 21004KT 10SM SCT011 BKN033 BKN150 BKN250 24/20 A2995 RMK AO2 SLP137 T02390200"
## [5] "KMSP 061953Z 23005KT 10SM FEW011 BKN036 BKN070 BKN250 24/20 A2996 RMK AO2 RAB20E49 SLP140 P0000 T02440200"
##
## Mismatch at time 2016-09-06 18:53 UTC
## [1] "KMSP 061653Z 15004KT 10SM FEW110 BKN150 BKN250 22/19 A2992 RMK AO2 LTG DSNT E RAE40 SLP127 P0002 T02220194"
## [2] "KMSP 061753Z 20004KT 10SM FEW010 FEW150 BKN180 BKN250 22/19 A2994 RMK AO2 LTG DSNT SE AND S SLP132 60022 T02220194 10228 20217 51016"
## [3] "KMSP 061853Z 21004KT 10SM SCT011 BKN033 BKN150 BKN250 24/20 A2995 RMK AO2 SLP137 T02390200"
## [4] "KMSP 061953Z 23005KT 10SM FEW011 BKN036 BKN070 BKN250 24/20 A2996 RMK AO2 RAB20E49 SLP140 P0000 T02440200"
## [5] "KMSP 062053Z 19003KT 10SM FEW018 BKN041 BKN130 BKN250 25/19 A2994 RMK AO2 LTG DSNT SE AND S SLP134 60000 T02500189 50002"
tmpMSP2$testAllBegins[lubridate::date(tmpMSP2$testAllBegins)=="2016-09-06"]
## [1] "2016-09-06 07:44:00 UTC" "2016-09-06 07:53:00 UTC"
## [3] "2016-09-06 09:34:00 UTC" "2016-09-06 10:14:00 UTC"
## [5] "2016-09-06 10:46:00 UTC" "2016-09-06 11:16:00 UTC"
## [7] "2016-09-06 12:37:00 UTC" "2016-09-06 13:21:00 UTC"
## [9] "2016-09-06 15:13:00 UTC" "2016-09-06 19:20:00 UTC"
tmpMSP2$testAllEnds[lubridate::date(tmpMSP2$testAllEnds)=="2016-09-06"]
## [1] "2016-09-06 08:01:00 UTC" "2016-09-06 10:01:00 UTC"
## [3] "2016-09-06 10:28:00 UTC" "2016-09-06 11:01:00 UTC"
## [5] "2016-09-06 12:12:00 UTC" "2016-09-06 13:00:00 UTC"
## [7] "2016-09-06 14:47:00 UTC" "2016-09-06 16:40:00 UTC"
## [9] "2016-09-06 19:49:00 UTC"
The process is correctly noting that it wants to delete the erroneous 0744 record, though it remains in the data causing the same mismatch errors.
The issue appears to be driven by getBeginEndTimeVector, which will be the next function to update.
A revisit to the precipitation extraction seems in order. Broadly, the extraction process is designed to get two things:
Extracting the precipitation state is simple, and is based on ddddddZ…pType…RMK occurring somewhere in the METAR.
Extracting the data for constructing the ranges is based on RMK…pTypeBnnEnnBnn… type records. There is a need to convert the range data to date-time, to check that the ranges are consistent with the extracted precipitation states, and to make a best guess change to the range data to drive the alignment.
This suggests three functions:
Example code includes:
# Function to extract precipitation state and range data from METAR
fnPrecip1 <- function(df, pType, showRegex=TRUE) {
# The remarks pattern is created based on the precipitation type
keyPattern <- paste0("(", pType, "[B|E]\\d+[0-9BE]*)")
if (showRegex) { cat("\nRegex search code is:", keyPattern, "\n") }
# Extract the current precipitation state, lagged precipitation state, and range data from METAR
df <- df %>%
select(origMETAR, dtime) %>%
mutate(strPrecip=str_extract(origMETAR, pattern=keyPattern),
curPrecip=str_detect(origMETAR, pattern=paste0("\\d{6}Z.*", pType, ".*RMK")),
lagPrecip=lag(curPrecip, 1)
)
# Confirm that df is unique by dtime (algorithm depends on this)
dups <- df %>%
select(dtime) %>%
duplicated() %>%
any()
if (dups) {
stop("\nThere are duplicate values of dtime - investigate and fix\n")
}
# Returnt the file
df
}
msp_precip1 <- fnPrecip1(kmsp_2016, pType="(?<!FZ)RA")
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
msp_precip1
## # A tibble: 8,817 x 5
## origMETAR dtime strPrecip curPrecip lagPrecip
## <chr> <dttm> <chr> <lgl> <lgl>
## 1 KMSP 310053Z 27008KT 4SM -~ 2015-12-31 00:53:00 <NA> FALSE NA
## 2 KMSP 310153Z 23011KT 5SM -~ 2015-12-31 01:53:00 <NA> FALSE FALSE
## 3 KMSP 310253Z 25007KT 9SM -~ 2015-12-31 02:53:00 <NA> FALSE FALSE
## 4 KMSP 310353Z 27008KT 10SM ~ 2015-12-31 03:53:00 <NA> FALSE FALSE
## 5 KMSP 310453Z 26007KT 10SM ~ 2015-12-31 04:53:00 <NA> FALSE FALSE
## 6 KMSP 310553Z 23004KT 9SM F~ 2015-12-31 05:53:00 <NA> FALSE FALSE
## 7 KMSP 310653Z 22003KT 8SM O~ 2015-12-31 06:53:00 <NA> FALSE FALSE
## 8 KMSP 310753Z 23005KT 9SM O~ 2015-12-31 07:53:00 <NA> FALSE FALSE
## 9 KMSP 310853Z 22006KT 9SM S~ 2015-12-31 08:53:00 <NA> FALSE FALSE
## 10 KMSP 310953Z 23006KT 9SM C~ 2015-12-31 09:53:00 <NA> FALSE FALSE
## # ... with 8,807 more rows
The next function splits out the B/E data in strPrecip and creates times for each:
# Helper function to zero-pad
zeroPad <- function(x, width=2, side="left", pad="0") {
str_pad(x, width=width, side=side, pad=pad)
}
# Helper function to take a date-time and a minutes and create a new date-time
helperBEDateTime <- function(dt, mins) {
# Create the date as character (lubridate and dplyr do not always work well together)
dateChar <- ifelse(str_length(mins)==4 & str_sub(mins, 1, 2)=="23",
as.character(lubridate::as_date(lubridate::date(dt)-1)),
as.character(lubridate::date(dt))
)
# Create the hours and minutes as character
hrMinChar <- ifelse(str_length(mins)==4,
mins,
paste0(zeroPad(lubridate::hour(dt)), mins)
)
# Return the appropriate date-time
lubridate::ymd_hm(paste0(dateChar, " ", hrMinChar))
}
# Function to create times for each piece of B/E data
fnPrecip2 <- function(df) {
# Create the begin and end times by splitting precipString, then format as tibble and reattach dtime
bEData <- df %>%
pull(strPrecip) %>%
str_extract_all(pattern="[BE]\\d+", simplify=TRUE) %>%
as.data.frame(stringsAsFactors=FALSE) %>%
tibble::as_tibble() %>%
bind_cols(select(df, dtime))
# Pivot longer, create a dummy record where is.na(V1), and split in to time and state change
bEData <- bEData %>%
pivot_longer(-dtime) %>%
mutate(value=ifelse(name=="V1" & is.na(value), paste0("N", lubridate::minute(dtime)), value)) %>%
filter(value != "") %>%
mutate(chgType=str_sub(value, 1, 1),
chgNum=str_sub(value, 2, -1),
chgTime=helperBEDateTime(dt=dtime, mins=chgNum)
)
# Return the data
bEData
}
msp_precip2 <- fnPrecip2(msp_precip1)
msp_precip2
## # A tibble: 9,017 x 6
## dtime name value chgType chgNum chgTime
## <dttm> <chr> <chr> <chr> <chr> <dttm>
## 1 2015-12-31 00:53:00 V1 N53 N 53 2015-12-31 00:53:00
## 2 2015-12-31 01:53:00 V1 N53 N 53 2015-12-31 01:53:00
## 3 2015-12-31 02:53:00 V1 N53 N 53 2015-12-31 02:53:00
## 4 2015-12-31 03:53:00 V1 N53 N 53 2015-12-31 03:53:00
## 5 2015-12-31 04:53:00 V1 N53 N 53 2015-12-31 04:53:00
## 6 2015-12-31 05:53:00 V1 N53 N 53 2015-12-31 05:53:00
## 7 2015-12-31 06:53:00 V1 N53 N 53 2015-12-31 06:53:00
## 8 2015-12-31 07:53:00 V1 N53 N 53 2015-12-31 07:53:00
## 9 2015-12-31 08:53:00 V1 N53 N 53 2015-12-31 08:53:00
## 10 2015-12-31 09:53:00 V1 N53 N 53 2015-12-31 09:53:00
## # ... with 9,007 more rows
Then a function can be written to extract issues where the times are out of sequence:
# Function to extract issues - continuity, distance from METAR
fnPrecip3 <- function(df) {
issueCheck <- df %>%
mutate(deltaMETAR=dtime-chgTime,
deltaPrev=ifelse(row_number()==1, 3600, chgTime-lag(chgTime, 1)),
issueCons=(deltaMETAR < 0 | deltaMETAR > 3599 | deltaPrev <= 0)
)
cat("\nContinuity or consistency error - record(s) will be deleted\n")
issueCheck %>%
filter(issueCons) %>%
print()
issueCheck %>%
select(-deltaMETAR, -deltaPrev) %>%
filter(!issueCons)
}
msp_precip3 <- fnPrecip3(msp_precip2)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime deltaMETAR
## <dttm> <chr> <chr> <chr> <chr> <dttm> <drtn>
## 1 2016-09-06 08:53:00 V1 B0744 B 0744 2016-09-06 07:44:00 4140 secs
## # ... with 2 more variables: deltaPrev <dbl>, issueCons <lgl>
msp_precip3
## # A tibble: 9,016 x 7
## dtime name value chgType chgNum chgTime issueCons
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2015-12-31 00:53:00 V1 N53 N 53 2015-12-31 00:53:00 FALSE
## 2 2015-12-31 01:53:00 V1 N53 N 53 2015-12-31 01:53:00 FALSE
## 3 2015-12-31 02:53:00 V1 N53 N 53 2015-12-31 02:53:00 FALSE
## 4 2015-12-31 03:53:00 V1 N53 N 53 2015-12-31 03:53:00 FALSE
## 5 2015-12-31 04:53:00 V1 N53 N 53 2015-12-31 04:53:00 FALSE
## 6 2015-12-31 05:53:00 V1 N53 N 53 2015-12-31 05:53:00 FALSE
## 7 2015-12-31 06:53:00 V1 N53 N 53 2015-12-31 06:53:00 FALSE
## 8 2015-12-31 07:53:00 V1 N53 N 53 2015-12-31 07:53:00 FALSE
## 9 2015-12-31 08:53:00 V1 N53 N 53 2015-12-31 08:53:00 FALSE
## 10 2015-12-31 09:53:00 V1 N53 N 53 2015-12-31 09:53:00 FALSE
## # ... with 9,006 more rows
Then a function can be written to find issues where there are the wrong number, or wrong ordering, of begin and end times:
# Function to find wrong sequence of times
fnPrecip4 <- function(dfTimes, dfOrig) {
# Pivot dfTimes back to a single record, discarding any of the consistency issues
# OK if not all dtimes included; will get from dfOrig
dfCheck <- dfTimes %>%
filter(!issueCons) %>%
mutate(newValue=case_when(chgType=="N" ~ 0, chgType=="B" ~ 1, chgType=="E" ~ -1)) %>%
pivot_wider(dtime, names_from="name", values_from="newValue") %>%
right_join(select(dfOrig, dtime, curPrecip, lagPrecip), by="dtime") %>%
mutate_if(is.numeric, ~ifelse(is.na(.), 0, .))
# Create the cumsum of state change
cs <- dfCheck %>%
select(-dtime, -curPrecip) %>%
mutate(lagPrecip=as.integer(lagPrecip)) %>%
select(lagPrecip, everything()) %>%
apply(1, FUN=cumsum) %>%
t()
# Create the list of issues
# If cumsum is every anything other than 1 or 0 there is a problem
# If the last value of cumsum does not equal curPrecip there is a problem
issue01 <- which(apply(cs, 1, FUN=max) > 1 | apply(cs, 1, FUN=min) < 0)
issuelc <- which(dfCheck$curPrecip != cs[, ncol(cs)])
issueAll <- sort(unique(c(issue01, issuelc)))
cat("\n\nIssues with begin when precipitation or end when no precipitation:", length(issue01))
cat("\nIssues where state change does not add to the total:", length(issuelc))
cat("\nTotal issues (may be less than sum due to same datetime in both):", length(issueAll), "\n\n")
# print(dfCheck[issueAll, ] %>% select(dtime, lagPrecip, everything()))
dfCheck[sort(unique(c(issue01, issuelc))), "dtime"]
}
# Get the list of bad date-times
msp_precip4 <- fnPrecip4(msp_precip3, msp_precip1)
##
##
## Issues with begin when precipitation or end when no precipitation: 2
## Issues where state change does not add to the total: 8
## Total issues (may be less than sum due to same datetime in both): 8
Then, a function can work through only the bad date-times, leaving the good date-times “as is”
# Function to work through each bad date-time and suggest new ranges
fnPrecip5 <- function(df, issueTimes, lagCurFrame) {
# Create a dataframe for issueTimes
issues <- issueTimes %>%
mutate(issue=TRUE)
# Split df in to issues and non-issue
df <- df %>%
left_join(issues, by="dtime") %>%
mutate(issue=ifelse(is.na(issue), FALSE, issue),
lastRecord=(row_number()==n())
) %>%
left_join(select(lagCurFrame, dtime, lagPrecip, curPrecip), by="dtime")
dfIssues <- df %>%
filter(issue)
dfNoIssues <- df %>%
filter(!issue)
# Note the records to be worked through
cat("\nRecords to be addressed include:\n")
dfIssues %>%
select(-issueCons, -issue) %>%
print()
# Work through a record by starting with lagPrecip
# If first record inconsistent with lagPrecip, flag for deletion and keep state; update state otherwise
# if next record inconsistent with previous state, flag for deletion and keep state; update otherwise
# If final record inconsistent with curPrecip, add a record using dtime as the time
# Get the unique times with issues
dtimeVec <- dfIssues %>% pull(dtime) %>% unique()
# Create empty vectors for deletes and adds
delVec <- as.POSIXct(character(0))
addBegin <- as.POSIXct(character(0))
addEnd <- as.POSIXct(character(0))
# Populate the delete and add vectors
for (dt in dtimeVec) {
# Pull the records for this time
dtRecords <- dfIssues %>% filter(dtime==lubridate::as_datetime(dt))
# Initialize the previous state to lagPrecip and the error vector to blank
preState <-dtRecords$lagPrecip[1]
# Loop through and flag state change errors
for (ctr in 1:nrow(dtRecords)) {
# Grow the deletions vector
if ((!preState & dtRecords$chgType[ctr]=="E") | (preState & dtRecords$chgType[ctr]=="B")) {
delVec <- c(delVec, dtRecords$chgTime[ctr])
# do not modify the state, it has not changed due to the deletion
} else {
# do not grow the deletion vector, the state change is OK here if chgType is not "N"
if (dtRecords$chgType[ctr] != "N") { preState <- !preState }
}
# Create a single addition if needed
if (ctr==nrow(dtRecords) & preState != dtRecords$curPrecip[ctr]) {
if (dtRecords$curPrecip[ctr]) { addBegin <- c(addBegin, dt) }
if (!dtRecords$curPrecip[ctr]) { addEnd <- c(addEnd, dt) }
}
}
}
# If there is precipitation at the very end, addEnd for dTime+1
fixEnd <- dfNoIssues %>%
filter(lastRecord & curPrecip) %>%
mutate(timeUse=dtime+1) %>%
pull(timeUse)
if (length(fixEnd) > 0) {
addEnd <- c(addEnd, fixEnd)
cat("\nAdding final end time to cap interval at end of file:", as.character(fixEnd), "\n")
}
# Print the key vectors
cat("\nStart/end times deleted:\n")
print(lubridate::as_datetime(delVec))
cat("\nBegin times added\n")
print(lubridate::as_datetime(addBegin))
cat("\nEnd times added:\n")
print(lubridate::as_datetime(addEnd))
# Create the full list of issue start times
beginIssues <- dfIssues %>%
filter(chgType=="B") %>%
pull(chgTime)
endIssues <- dfIssues %>%
filter(chgType=="E") %>%
pull(chgTime)
beginIssues <- c(beginIssues[!beginIssues %in% delVec], addBegin)
endIssues <- c(endIssues[!endIssues %in% delVec], addEnd)
# cat("\nNew begin times list from issues:\n")
# print(beginIssues)
# cat("\nNew end times list from issues:\n")
# print(endIssues)
# Create the full list of start and end times
beginOK <- dfNoIssues %>%
filter(chgType=="B") %>%
pull(chgTime)
endOK <- dfNoIssues %>%
filter(chgType=="E") %>%
pull(chgTime)
# Return a list of beginTimes and endTimes
list(beginTimes=sort(c(beginOK, beginIssues)),
endTimes=sort(c(endOK, endIssues))
)
}
msp_precip5 <- fnPrecip5(msp_precip3, issueTimes=msp_precip4, lagCurFrame=msp_precip1)
##
## Records to be addressed include:
## # A tibble: 9 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-05 15:53:00 V1 E22 E 22 2016-04-05 15:22:00 FALSE
## 2 2016-05-29 19:53:00 V1 B31 B 31 2016-05-29 19:31:00 FALSE
## 3 2016-06-20 04:53:00 V1 B11 B 11 2016-06-20 04:11:00 FALSE
## 4 2016-09-06 07:53:00 V1 N53 N 53 2016-09-06 07:53:00 FALSE
## 5 2016-09-06 08:53:00 V2 E01 E 01 2016-09-06 08:01:00 FALSE
## 6 2016-09-06 08:53:00 V3 B17 B 17 2016-09-06 08:17:00 FALSE
## 7 2016-10-18 04:53:00 V1 E35 E 35 2016-10-18 04:35:00 FALSE
## 8 2016-11-22 13:53:00 V1 N53 N 53 2016-11-22 13:53:00 FALSE
## 9 2016-11-29 07:53:00 V1 B18 B 18 2016-11-29 07:18:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-10-18 04:35:00 UTC" "2016-11-29 07:18:00 UTC"
##
## Begin times added
## [1] "2016-04-05 15:53:00 UTC" "2016-09-06 07:53:00 UTC"
##
## End times added:
## [1] "2016-05-29 19:53:00 UTC" "2016-06-20 04:53:00 UTC"
## [3] "2016-09-06 08:53:00 UTC" "2016-11-22 13:53:00 UTC"
The intervals can then be checked for consistency:
fnPrecip6 <- function(lst, df){
# Extract the beginning and interval times
begins <- lst[["beginTimes"]]
ends <- lst[["endTimes"]]
durs <- ends - begins
# Create intervals from the raw list file
precipInts <- interval(begins, ends - 1) # make the interval stop the minute before the end time
# Extract the METAR and date-time information and check for overlaps
dtime <- df %>% pull(dtime)
metar <- df %>% pull(origMETAR)
precipMETAR <- df %>% pull(curPrecip)
intMETAR <- sapply(dtime, FUN=function(x) {x %within% precipInts %>% any()})
# Check for the consistency of the observations and print the mismatches
print(table(precipMETAR, intMETAR))
mism <- which(precipMETAR != intMETAR)
if (length(mism) == 0) {
cat("\nFull matches between METAR observations and intervals\n")
} else {
for (x in mism) {
cat("\nMismatch at time", strftime(dtime[x], format="%Y-%m-%d %H:%M", tz="UTC"), "UTC\n")
print(metar[max(1, x-2):min(length(metar), x+2)])
}
}
list(beginTimes=lst$beginTimes, endTimes=lst$endTimes,
precipInts=precipInts, mismatches=mism, mismatchTimes=dtime[mism]
)
}
msp_precip6 <- fnPrecip6(msp_precip5, msp_precip1)
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8390 0
## TRUE 0 427
##
## Full matches between METAR observations and intervals
The algorithm correctly fixes the Minneapolis issues, though it is still rather clunky. Next steps are to make a general algorithm and run against all locales.
A final function is built to combine the precipitation functions:
# Function to combine precipitation functions
combinePrecipFunctions <- function(df,
pType="(?<!FZ)RA",
errorLengthInconsistency=TRUE
) {
# FUNCTION ARGUMENTS:
# df: data frame or tibble containing processed METAR data, including dtime and origMETAR
# pType: regex code for the precipitation type of interest
# errorLengthInconsistency: boolean, whether to error out if beginTimes, endTimes, or precipInts have different lengths or imply intervals of non-positive durection
# Extract the current precipitation state, lagged precipitation state, and string of begin/end times
precip1 <- fnPrecip1(df, pType=pType)
# Find and delete continuity (next before previous) and consistency (belongs to wrong METAR) errors
precip3 <- fnPrecip2(precip1) %>%
fnPrecip3()
# Find issues with precipitation begin and end times compared to each other and precipitation states
precip4 <- fnPrecip4(precip3, precip1)
# Correct for the issues with begin and end times and create a vector of begin times and end times
precip5 <- fnPrecip5(precip3, issueTimes=precip4, lagCurFrame=precip1)
# Check intervals for consistency with precipitation states
precip6 <- fnPrecip6(precip5, precip1)
# Extract beginTimes, endTimes, and precipInts
beginTimes <- precip6$beginTimes
endTimes <- precip6$endTimes
precipInts <- precip6$precipInts
# Check that the intervals are all positive and with same number of begin times and end times
lenBT <- length(beginTimes)
lenET <- length(endTimes)
lenPI <- length(precipInts)
piNotPositive <- (lubridate::time_length(precipInts) <= 0)
# Check for same lengths
if (max(lenBT, lenET, lenPI) != min(lenBT, lenET, lenPI)) {
cat("Inconsistent lengths - beginTimes:", lenBT, ", endTimes:", lenET, "precipInts:", lenPI, "\n")
if (errorLengthInconsistency) {
stop("Exiting due to inconsistent lengths")
} else {
cat("\nContinuing to process, output frame will append NA where needed\n")
maxLen <- max(lenBT, lenET, lenPI)
beginTimes <- c(beginTimes, rep(NA, maxLen-lenBT))
endTimes <- c(endTimes, rep(NA, maxLen-lenET))
precipInts <- c(precipInts, rep(NA, maxLen-lenPI))
}
}
# Check for positive intervals
if (sum(piNotPositive) > 0) {
cat("\nPrecipitation intervals are non-positive:",
sum(piNotPositive),
"at positions",
which(piNotPositive),
"\n"
)
if (errorLengthInconsistency) { stop("Exiting due to inconsistency in interval lengths") }
}
# Return the relevant information
list(pStateFrame=precip1,
pIssueTimes=precip4,
mismatches=precip6$mismatches,
mismatchTimes=precip6$mismatchTimes,
beginEndInterval=tibble::tibble(beginTimes=beginTimes, endTime=endTimes, precipInts=precipInts)
)
}
The function can then be run for the Minneapolis 2016 rain data:
kmsp_2016_RA <- combinePrecipFunctions(kmsp_2016, pType="(?<!FZ)RA", errorLengthInconsistency=TRUE)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime deltaMETAR
## <dttm> <chr> <chr> <chr> <chr> <dttm> <drtn>
## 1 2016-09-06 08:53:00 V1 B0744 B 0744 2016-09-06 07:44:00 4140 secs
## # ... with 2 more variables: deltaPrev <dbl>, issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 2
## Issues where state change does not add to the total: 8
## Total issues (may be less than sum due to same datetime in both): 8
##
##
## Records to be addressed include:
## # A tibble: 9 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-05 15:53:00 V1 E22 E 22 2016-04-05 15:22:00 FALSE
## 2 2016-05-29 19:53:00 V1 B31 B 31 2016-05-29 19:31:00 FALSE
## 3 2016-06-20 04:53:00 V1 B11 B 11 2016-06-20 04:11:00 FALSE
## 4 2016-09-06 07:53:00 V1 N53 N 53 2016-09-06 07:53:00 FALSE
## 5 2016-09-06 08:53:00 V2 E01 E 01 2016-09-06 08:01:00 FALSE
## 6 2016-09-06 08:53:00 V3 B17 B 17 2016-09-06 08:17:00 FALSE
## 7 2016-10-18 04:53:00 V1 E35 E 35 2016-10-18 04:35:00 FALSE
## 8 2016-11-22 13:53:00 V1 N53 N 53 2016-11-22 13:53:00 FALSE
## 9 2016-11-29 07:53:00 V1 B18 B 18 2016-11-29 07:18:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-10-18 04:35:00 UTC" "2016-11-29 07:18:00 UTC"
##
## Begin times added
## [1] "2016-04-05 15:53:00 UTC" "2016-09-06 07:53:00 UTC"
##
## End times added:
## [1] "2016-05-29 19:53:00 UTC" "2016-06-20 04:53:00 UTC"
## [3] "2016-09-06 08:53:00 UTC" "2016-11-22 13:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8390 0
## TRUE 0 427
##
## Full matches between METAR observations and intervals
And the function can then be run for rain for all of the 2016 data:
fileNames <- grep(pattern="2016", names(cityNameMapper), value=TRUE)
rainList <- vector("list", length(fileNames))
names(rainList) <- paste0(fileNames, "_RA")
for (city in fileNames) {
cat("\n\n*** Processing for:", cityNameMapper[city], "\n")
# Run the rain extraction for city and store in rainList
rainList[[paste0(city, "_RA")]] <- combinePrecipFunctions(get(city),
pType="(?<!FZ)RA",
errorLengthInconsistency=TRUE
)
}
##
##
## *** Processing for: Detroit, MI (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 3
## Issues where state change does not add to the total: 7
## Total issues (may be less than sum due to same datetime in both): 7
##
##
## Records to be addressed include:
## # A tibble: 9 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-02-24 11:53:00 V1 N53 N 53 2016-02-24 11:53:00 FALSE
## 2 2016-04-01 00:53:00 V1 B01 B 01 2016-04-01 00:01:00 FALSE
## 3 2016-04-01 00:53:00 V2 E12 E 12 2016-04-01 00:12:00 FALSE
## 4 2016-04-07 22:53:00 V1 N53 N 53 2016-04-07 22:53:00 FALSE
## 5 2016-04-07 23:53:00 V1 N53 N 53 2016-04-07 23:53:00 FALSE
## 6 2016-07-14 00:53:00 V1 E30 E 30 2016-07-14 00:30:00 FALSE
## 7 2016-08-15 00:53:00 V1 B2355 B 2355 2016-08-14 23:55:00 FALSE
## 8 2016-08-15 00:53:00 V2 E29 E 29 2016-08-15 00:29:00 FALSE
## 9 2016-08-16 14:53:00 V1 N53 N 53 2016-08-16 14:53:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-04-01 00:01:00 UTC" "2016-07-14 00:30:00 UTC"
## [3] "2016-08-14 23:55:00 UTC"
##
## Begin times added
## [1] "2016-02-24 11:53:00 UTC" "2016-04-07 23:53:00 UTC"
##
## End times added:
## [1] "2016-04-07 22:53:00 UTC" "2016-08-16 14:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8258 0
## TRUE 0 560
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Newark, NJ (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 2
## Issues where state change does not add to the total: 2
## Total issues (may be less than sum due to same datetime in both): 2
##
##
## Records to be addressed include:
## # A tibble: 2 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-02-24 19:51:00 V1 B1858 B 1858 2016-02-24 18:58:00 FALSE
## 2 2016-11-20 08:51:00 V1 E0758 E 0758 2016-11-20 07:58:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-02-24 18:58:00 UTC" "2016-11-20 07:58:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8195 0
## TRUE 0 626
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Green Bay, WI (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 1
## Total issues (may be less than sum due to same datetime in both): 1
##
##
## Records to be addressed include:
## # A tibble: 2 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-05-02 17:53:00 V1 E1659 E 1659 2016-05-02 16:59:00 FALSE
## 2 2016-05-02 17:53:00 V2 B39 B 39 2016-05-02 17:39:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-05-02 17:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8378 0
## TRUE 0 425
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Grand Rapids, MI (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 2
## Issues where state change does not add to the total: 5
## Total issues (may be less than sum due to same datetime in both): 5
##
##
## Records to be addressed include:
## # A tibble: 5 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-07 12:53:00 V1 E15 E 15 2016-04-07 12:15:00 FALSE
## 2 2016-06-26 13:53:00 V1 B46 B 46 2016-06-26 13:46:00 FALSE
## 3 2016-06-26 14:53:00 V1 N53 N 53 2016-06-26 14:53:00 FALSE
## 4 2016-07-24 15:53:00 V1 N53 N 53 2016-07-24 15:53:00 FALSE
## 5 2016-08-12 09:53:00 V1 B0854 B 0854 2016-08-12 08:54:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-04-07 12:15:00 UTC" "2016-08-12 08:54:00 UTC"
##
## Begin times added
## [1] "2016-06-26 14:53:00 UTC"
##
## End times added:
## [1] "2016-06-26 13:53:00 UTC" "2016-07-24 15:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8208 0
## TRUE 0 604
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Houston, TX (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime deltaMETAR
## <dttm> <chr> <chr> <chr> <chr> <dttm> <drtn>
## 1 2016-03-23 22:53:00 V1 B2145 B 2145 2016-03-23 21:45:00 4080 secs
## # ... with 2 more variables: deltaPrev <dbl>, issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 3
## Issues where state change does not add to the total: 7
## Total issues (may be less than sum due to same datetime in both): 7
##
##
## Records to be addressed include:
## # A tibble: 9 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-03-23 22:53:00 V2 E16 E 16 2016-03-23 22:16:00 FALSE
## 2 2016-06-06 00:53:00 V1 E40 E 40 2016-06-06 00:40:00 FALSE
## 3 2016-08-14 00:53:00 V1 N53 N 53 2016-08-14 00:53:00 FALSE
## 4 2016-08-14 22:53:00 V1 N53 N 53 2016-08-14 22:53:00 FALSE
## 5 2016-08-21 19:53:00 V1 B1857 B 1857 2016-08-21 18:57:00 FALSE
## 6 2016-08-21 19:53:00 V2 E27 E 27 2016-08-21 19:27:00 FALSE
## 7 2016-08-21 19:53:00 V3 B45 B 45 2016-08-21 19:45:00 FALSE
## 8 2016-12-03 18:53:00 V1 E12 E 12 2016-12-03 18:12:00 FALSE
## 9 2016-12-04 10:53:00 V1 N53 N 53 2016-12-04 10:53:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-03-23 22:16:00 UTC" "2016-06-06 00:40:00 UTC"
## [3] "2016-12-03 18:12:00 UTC"
##
## Begin times added
## [1] "2016-08-14 00:53:00 UTC" "2016-08-14 22:53:00 UTC"
##
## End times added:
## [1] "2016-08-21 19:53:00 UTC" "2016-12-04 10:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8368 0
## TRUE 0 448
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Indianapolis, IN (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 5
## Total issues (may be less than sum due to same datetime in both): 5
##
##
## Records to be addressed include:
## # A tibble: 6 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-07 23:54:00 V1 N54 N 54 2016-04-07 23:54:00 FALSE
## 2 2016-05-08 19:54:00 V1 N54 N 54 2016-05-08 19:54:00 FALSE
## 3 2016-07-18 17:54:00 V1 N54 N 54 2016-07-18 17:54:00 FALSE
## 4 2016-08-07 19:54:00 V1 N54 N 54 2016-08-07 19:54:00 FALSE
## 5 2016-09-05 21:54:00 V1 B01 B 01 2016-09-05 21:01:00 FALSE
## 6 2016-09-05 21:54:00 V2 E38 E 38 2016-09-05 21:38:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-09-05 21:01:00 UTC"
##
## Begin times added
## [1] "2016-07-18 17:54:00 UTC"
##
## End times added:
## [1] "2016-04-07 23:54:00 UTC" "2016-05-08 19:54:00 UTC"
## [3] "2016-08-07 19:54:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8144 0
## TRUE 0 623
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Las Vegas, NV (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 1
## Total issues (may be less than sum due to same datetime in both): 1
##
##
## Records to be addressed include:
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-30 23:56:00 V1 N56 N 56 2016-04-30 23:56:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## [1] "2016-04-30 23:56:00 UTC"
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8710 0
## TRUE 0 108
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Lincoln, NE (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 3
## Total issues (may be less than sum due to same datetime in both): 3
##
##
## Records to be addressed include:
## # A tibble: 3 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-07-13 18:54:00 V1 N54 N 54 2016-07-13 18:54:00 FALSE
## 2 2016-08-29 22:54:00 V1 N54 N 54 2016-08-29 22:54:00 FALSE
## 3 2016-09-16 06:54:00 V1 N54 N 54 2016-09-16 06:54:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## [1] "2016-08-29 22:54:00 UTC" "2016-09-16 06:54:00 UTC"
##
## End times added:
## [1] "2016-07-13 18:54:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8500 0
## TRUE 0 313
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Milwaukee, WI (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 4
## Total issues (may be less than sum due to same datetime in both): 4
##
##
## Records to be addressed include:
## # A tibble: 4 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-10 10:52:00 V1 N52 N 52 2016-04-10 10:52:00 FALSE
## 2 2016-06-10 03:52:00 V1 N52 N 52 2016-06-10 03:52:00 FALSE
## 3 2016-08-20 13:52:00 V1 N52 N 52 2016-08-20 13:52:00 FALSE
## 4 2016-10-07 02:52:00 V1 N52 N 52 2016-10-07 02:52:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## [1] "2016-04-10 10:52:00 UTC" "2016-06-10 03:52:00 UTC"
## [3] "2016-08-20 13:52:00 UTC" "2016-10-07 02:52:00 UTC"
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8346 0
## TRUE 0 462
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Madison, WI (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 2
## Total issues (may be less than sum due to same datetime in both): 2
##
##
## Records to be addressed include:
## # A tibble: 2 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-09-07 12:53:00 V1 E52 E 52 2016-09-07 12:52:00 FALSE
## 2 2016-09-16 11:53:00 V1 N53 N 53 2016-09-16 11:53:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-09-07 12:52:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-09-16 11:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8347 0
## TRUE 0 451
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Minneapolis, MN (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime deltaMETAR
## <dttm> <chr> <chr> <chr> <chr> <dttm> <drtn>
## 1 2016-09-06 08:53:00 V1 B0744 B 0744 2016-09-06 07:44:00 4140 secs
## # ... with 2 more variables: deltaPrev <dbl>, issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 2
## Issues where state change does not add to the total: 8
## Total issues (may be less than sum due to same datetime in both): 8
##
##
## Records to be addressed include:
## # A tibble: 9 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-05 15:53:00 V1 E22 E 22 2016-04-05 15:22:00 FALSE
## 2 2016-05-29 19:53:00 V1 B31 B 31 2016-05-29 19:31:00 FALSE
## 3 2016-06-20 04:53:00 V1 B11 B 11 2016-06-20 04:11:00 FALSE
## 4 2016-09-06 07:53:00 V1 N53 N 53 2016-09-06 07:53:00 FALSE
## 5 2016-09-06 08:53:00 V2 E01 E 01 2016-09-06 08:01:00 FALSE
## 6 2016-09-06 08:53:00 V3 B17 B 17 2016-09-06 08:17:00 FALSE
## 7 2016-10-18 04:53:00 V1 E35 E 35 2016-10-18 04:35:00 FALSE
## 8 2016-11-22 13:53:00 V1 N53 N 53 2016-11-22 13:53:00 FALSE
## 9 2016-11-29 07:53:00 V1 B18 B 18 2016-11-29 07:18:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-10-18 04:35:00 UTC" "2016-11-29 07:18:00 UTC"
##
## Begin times added
## [1] "2016-04-05 15:53:00 UTC" "2016-09-06 07:53:00 UTC"
##
## End times added:
## [1] "2016-05-29 19:53:00 UTC" "2016-06-20 04:53:00 UTC"
## [3] "2016-09-06 08:53:00 UTC" "2016-11-22 13:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8390 0
## TRUE 0 427
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: New Orleans, LA (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime deltaMETAR
## <dttm> <chr> <chr> <chr> <chr> <dttm> <drtn>
## 1 2017-01-01 09:53:00 V1 E0852 E 0852 2017-01-01 08:52:00 3660 secs
## # ... with 2 more variables: deltaPrev <dbl>, issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 2
## Issues where state change does not add to the total: 4
## Total issues (may be less than sum due to same datetime in both): 4
##
##
## Records to be addressed include:
## # A tibble: 5 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-03-11 20:53:00 V1 B07 B 07 2016-03-11 20:07:00 FALSE
## 2 2016-03-27 16:53:00 V1 N53 N 53 2016-03-27 16:53:00 FALSE
## 3 2016-08-14 23:53:00 V1 B39 B 39 2016-08-14 23:39:00 FALSE
## 4 2016-12-13 23:53:00 V1 E2258 E 2258 2016-12-13 22:58:00 FALSE
## 5 2016-12-13 23:53:00 V2 B30 B 30 2016-12-13 23:30:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Adding final end time to cap interval at end of file: 2017-01-01 23:53:01
##
## Start/end times deleted:
## [1] "2016-03-11 20:07:00 UTC" "2016-12-13 22:58:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-03-27 16:53:00 UTC" "2016-08-14 23:53:00 UTC"
## [3] "2017-01-01 23:53:01 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8375 0
## TRUE 0 438
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Chicago, IL (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 3
## Total issues (may be less than sum due to same datetime in both): 3
##
##
## Records to be addressed include:
## # A tibble: 3 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-03-24 21:51:00 V1 N51 N 51 2016-03-24 21:51:00 FALSE
## 2 2016-07-13 18:51:00 V1 N51 N 51 2016-07-13 18:51:00 FALSE
## 3 2016-08-29 19:51:00 V1 N51 N 51 2016-08-29 19:51:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## [1] "2016-03-24 21:51:00 UTC" "2016-08-29 19:51:00 UTC"
##
## End times added:
## [1] "2016-07-13 18:51:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8349 0
## TRUE 0 466
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: San Diego, CA (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 0
## Total issues (may be less than sum due to same datetime in both): 0
##
##
## Records to be addressed include:
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, lastRecord <lgl>, lagPrecip <lgl>,
## # curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8612 0
## TRUE 0 198
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Traverse City, MI (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 3
## Issues where state change does not add to the total: 5
## Total issues (may be less than sum due to same datetime in both): 5
##
##
## Records to be addressed include:
## # A tibble: 7 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-03-28 10:53:00 V1 E0957 E 0957 2016-03-28 09:57:00 FALSE
## 2 2016-06-06 20:53:00 V1 B39 B 39 2016-06-06 20:39:00 FALSE
## 3 2016-06-06 20:53:00 V2 E49 E 49 2016-06-06 20:49:00 FALSE
## 4 2016-06-06 21:53:00 V1 B17 B 17 2016-06-06 21:17:00 FALSE
## 5 2016-06-06 21:53:00 V2 E38 E 38 2016-06-06 21:38:00 FALSE
## 6 2016-09-28 19:53:00 V1 N53 N 53 2016-09-28 19:53:00 FALSE
## 7 2016-12-02 16:53:00 V1 E29 E 29 2016-12-02 16:29:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-03-28 09:57:00 UTC" "2016-06-06 21:17:00 UTC"
## [3] "2016-12-02 16:29:00 UTC"
##
## Begin times added
## [1] "2016-06-06 20:53:00 UTC"
##
## End times added:
## [1] "2016-09-28 19:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8345 0
## TRUE 0 469
##
## Full matches between METAR observations and intervals
The rain data parsed successfully, with all intervals of the proper length.
The function can also be run for snow for all of the 2016 data:
fileNames <- grep(pattern="2016", names(cityNameMapper), value=TRUE)
snowList <- vector("list", length(fileNames))
names(snowList) <- paste0(fileNames, "_SN")
for (city in fileNames) {
cat("\n\n*** Processing for:", cityNameMapper[city], "\n")
# Run the snow extraction for city and store in snowList
snowList[[paste0(city, "_SN")]] <- combinePrecipFunctions(get(city),
pType="(?<!BL)SN",
errorLengthInconsistency=TRUE
)
}
##
##
## *** Processing for: Detroit, MI (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 2
## Total issues (may be less than sum due to same datetime in both): 2
##
##
## Records to be addressed include:
## # A tibble: 2 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-02-09 10:53:00 V1 N53 N 53 2016-02-09 10:53:00 FALSE
## 2 2016-02-24 11:53:00 V1 B12 B 12 2016-02-24 11:12:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## [1] "2016-02-09 10:53:00 UTC"
##
## End times added:
## [1] "2016-02-24 11:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8344 0
## TRUE 0 474
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Newark, NJ (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 3
## Total issues (may be less than sum due to same datetime in both): 3
##
##
## Records to be addressed include:
## # A tibble: 3 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-02-10 14:51:00 V1 N51 N 51 2016-02-10 14:51:00 FALSE
## 2 2016-11-20 07:51:00 V1 N51 N 51 2016-11-20 07:51:00 FALSE
## 3 2016-11-20 08:51:00 V1 E0756 E 0756 2016-11-20 07:56:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-11-20 07:56:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-02-10 14:51:00 UTC" "2016-11-20 07:51:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8683 0
## TRUE 0 138
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Green Bay, WI (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 2
## Total issues (may be less than sum due to same datetime in both): 2
##
##
## Records to be addressed include:
## # A tibble: 3 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-12-07 15:53:00 V1 E09 E 09 2016-12-07 15:09:00 FALSE
## 2 2016-12-07 15:53:00 V2 B38 B 38 2016-12-07 15:38:00 FALSE
## 3 2016-12-29 17:53:00 V1 N53 N 53 2016-12-29 17:53:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-12-07 15:09:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-12-29 17:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8460 0
## TRUE 0 343
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Grand Rapids, MI (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 3
## Total issues (may be less than sum due to same datetime in both): 3
##
##
## Records to be addressed include:
## # A tibble: 3 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-02-04 20:53:00 V1 N53 N 53 2016-02-04 20:53:00 FALSE
## 2 2016-04-02 16:53:00 V1 N53 N 53 2016-04-02 16:53:00 FALSE
## 3 2016-04-02 17:53:00 V1 N53 N 53 2016-04-02 17:53:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## [1] "2016-02-04 20:53:00 UTC" "2016-04-02 17:53:00 UTC"
##
## End times added:
## [1] "2016-04-02 16:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8137 0
## TRUE 0 675
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Houston, TX (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 0
## Total issues (may be less than sum due to same datetime in both): 0
##
##
## Records to be addressed include:
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, lastRecord <lgl>, lagPrecip <lgl>,
## # curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE
## FALSE 8816
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Indianapolis, IN (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 0
## Total issues (may be less than sum due to same datetime in both): 0
##
##
## Records to be addressed include:
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, lastRecord <lgl>, lagPrecip <lgl>,
## # curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8529 0
## TRUE 0 238
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Las Vegas, NV (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 0
## Total issues (may be less than sum due to same datetime in both): 0
##
##
## Records to be addressed include:
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, lastRecord <lgl>, lagPrecip <lgl>,
## # curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE
## FALSE 8818
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Lincoln, NE (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 1
## Total issues (may be less than sum due to same datetime in both): 1
##
##
## Records to be addressed include:
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-01-19 12:54:00 V1 B1155 B 1155 2016-01-19 11:55:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-01-19 11:55:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8719 0
## TRUE 0 94
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Milwaukee, WI (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 1
## Total issues (may be less than sum due to same datetime in both): 1
##
##
## Records to be addressed include:
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-01-26 01:52:00 V1 E39 E 39 2016-01-26 01:39:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-01-26 01:39:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8462 0
## TRUE 0 346
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Madison, WI (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 3
## Issues where state change does not add to the total: 6
## Total issues (may be less than sum due to same datetime in both): 6
##
##
## Records to be addressed include:
## # A tibble: 6 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-01-09 14:53:00 V1 E39 E 39 2016-01-09 14:39:00 FALSE
## 2 2016-01-12 12:53:00 V1 N53 N 53 2016-01-12 12:53:00 FALSE
## 3 2016-02-09 04:53:00 V1 N53 N 53 2016-02-09 04:53:00 FALSE
## 4 2016-02-09 05:53:00 V1 N53 N 53 2016-02-09 05:53:00 FALSE
## 5 2016-04-08 14:53:00 V1 E36 E 36 2016-04-08 14:36:00 FALSE
## 6 2016-12-12 05:53:00 V1 E01 E 01 2016-12-12 05:01:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-01-09 14:39:00 UTC" "2016-04-08 14:36:00 UTC"
## [3] "2016-12-12 05:01:00 UTC"
##
## Begin times added
## [1] "2016-02-09 05:53:00 UTC"
##
## End times added:
## [1] "2016-01-12 12:53:00 UTC" "2016-02-09 04:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8435 0
## TRUE 0 363
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Minneapolis, MN (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 4
## Total issues (may be less than sum due to same datetime in both): 4
##
##
## Records to be addressed include:
## # A tibble: 4 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-01-12 00:53:00 V1 N53 N 53 2016-01-12 00:53:00 FALSE
## 2 2016-12-07 14:53:00 V1 N53 N 53 2016-12-07 14:53:00 FALSE
## 3 2016-12-07 15:53:00 V1 N53 N 53 2016-12-07 15:53:00 FALSE
## 4 2016-12-08 22:53:00 V1 N53 N 53 2016-12-08 22:53:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## [1] "2016-01-12 00:53:00 UTC" "2016-12-07 15:53:00 UTC"
## [3] "2016-12-08 22:53:00 UTC"
##
## End times added:
## [1] "2016-12-07 14:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8438 0
## TRUE 0 379
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: New Orleans, LA (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 0
## Total issues (may be less than sum due to same datetime in both): 0
##
##
## Records to be addressed include:
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, lastRecord <lgl>, lagPrecip <lgl>,
## # curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE
## FALSE 8813
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Chicago, IL (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 5
## Total issues (may be less than sum due to same datetime in both): 5
##
##
## Records to be addressed include:
## # A tibble: 5 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-02-10 00:51:00 V1 N51 N 51 2016-02-10 00:51:00 FALSE
## 2 2016-03-24 21:51:00 V1 N51 N 51 2016-03-24 21:51:00 FALSE
## 3 2016-04-08 22:51:00 V1 N51 N 51 2016-04-08 22:51:00 FALSE
## 4 2016-04-09 04:51:00 V1 E46 E 46 2016-04-09 04:46:00 FALSE
## 5 2016-12-24 01:51:00 V1 N51 N 51 2016-12-24 01:51:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-04-09 04:46:00 UTC"
##
## Begin times added
## [1] "2016-02-10 00:51:00 UTC" "2016-03-24 21:51:00 UTC"
## [3] "2016-04-08 22:51:00 UTC" "2016-12-24 01:51:00 UTC"
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8532 0
## TRUE 0 283
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: San Diego, CA (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 0
## Total issues (may be less than sum due to same datetime in both): 0
##
##
## Records to be addressed include:
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, lastRecord <lgl>, lagPrecip <lgl>,
## # curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE
## FALSE 8810
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Traverse City, MI (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 2
## Total issues (may be less than sum due to same datetime in both): 2
##
##
## Records to be addressed include:
## # A tibble: 2 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-12-02 16:53:00 V1 E40 E 40 2016-12-02 16:40:00 FALSE
## 2 2016-12-18 06:53:00 V1 N53 N 53 2016-12-18 06:53:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-12-02 16:40:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-12-18 06:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 7828 0
## TRUE 0 986
##
## Full matches between METAR observations and intervals
The snow data parsed successfully, with all intervals of the proper length.
The function can also be run for thunderstorm for all of the 2016 data:
fileNames <- grep(pattern="2016", names(cityNameMapper), value=TRUE)
thunderList <- vector("list", length(fileNames))
names(thunderList) <- paste0(fileNames, "_TS")
for (city in fileNames) {
cat("\n\n*** Processing for:", cityNameMapper[city], "\n")
# Run the thunder extraction for city and store in thunderList
thunderList[[paste0(city, "_TS")]] <- combinePrecipFunctions(get(city),
pType="(?<!VC)TS",
errorLengthInconsistency=TRUE
)
}
##
##
## *** Processing for: Detroit, MI (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 1
## Total issues (may be less than sum due to same datetime in both): 1
##
##
## Records to be addressed include:
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-07-14 00:53:00 V1 E30 E 30 2016-07-14 00:30:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-07-14 00:30:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8784 0
## TRUE 0 34
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Newark, NJ (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 1
## Total issues (may be less than sum due to same datetime in both): 1
##
##
## Records to be addressed include:
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-02-24 19:51:00 V1 N51 N 51 2016-02-24 19:51:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-02-24 19:51:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8785 0
## TRUE 0 36
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Green Bay, WI (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 0
## Total issues (may be less than sum due to same datetime in both): 0
##
##
## Records to be addressed include:
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, lastRecord <lgl>, lagPrecip <lgl>,
## # curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8782 0
## TRUE 0 21
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Grand Rapids, MI (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 2
## Issues where state change does not add to the total: 2
## Total issues (may be less than sum due to same datetime in both): 2
##
##
## Records to be addressed include:
## # A tibble: 2 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-05-12 08:53:00 V1 E47 E 47 2016-05-12 08:47:00 FALSE
## 2 2016-06-15 07:53:00 V1 E18 E 18 2016-06-15 07:18:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-05-12 08:47:00 UTC" "2016-06-15 07:18:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8758 0
## TRUE 0 54
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Houston, TX (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 2
## Issues where state change does not add to the total: 4
## Total issues (may be less than sum due to same datetime in both): 4
##
##
## Records to be addressed include:
## # A tibble: 5 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-08-14 00:53:00 V1 N53 N 53 2016-08-14 00:53:00 FALSE
## 2 2016-08-14 22:53:00 V1 N53 N 53 2016-08-14 22:53:00 FALSE
## 3 2016-09-20 22:53:00 V1 E05 E 05 2016-09-20 22:05:00 FALSE
## 4 2016-09-24 19:53:00 V1 E11 E 11 2016-09-24 19:11:00 FALSE
## 5 2016-09-24 19:53:00 V2 B29 B 29 2016-09-24 19:29:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-09-20 22:05:00 UTC" "2016-09-24 19:11:00 UTC"
##
## Begin times added
## [1] "2016-08-14 00:53:00 UTC" "2016-08-14 22:53:00 UTC"
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8644 0
## TRUE 0 172
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Indianapolis, IN (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime deltaMETAR
## <dttm> <chr> <chr> <chr> <chr> <dttm> <drtn>
## 1 2016-07-18 18:54:00 V2 E21 E 21 2016-07-18 18:21:00 1980 secs
## # ... with 2 more variables: deltaPrev <dbl>, issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 14
## Issues where state change does not add to the total: 16
## Total issues (may be less than sum due to same datetime in both): 16
##
##
## Records to be addressed include:
## # A tibble: 31 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-28 10:54:00 V1 B37 B 37 2016-04-28 10:37:00 FALSE
## 2 2016-06-27 02:54:00 V1 N54 N 54 2016-06-27 02:54:00 FALSE
## 3 2016-07-13 20:54:00 V1 E40 E 40 2016-07-13 20:40:00 FALSE
## 4 2016-07-18 18:54:00 V1 B38 B 38 2016-07-18 18:38:00 FALSE
## 5 2016-09-08 19:54:00 V1 E27 E 27 2016-09-08 19:27:00 FALSE
## 6 2016-09-08 19:54:00 V2 B42 B 42 2016-09-08 19:42:00 FALSE
## 7 2016-09-10 06:54:00 V1 E30 E 30 2016-09-10 06:30:00 FALSE
## 8 2016-09-10 06:54:00 V2 E31 E 31 2016-09-10 06:31:00 FALSE
## 9 2016-09-10 06:54:00 V3 B37 B 37 2016-09-10 06:37:00 FALSE
## 10 2016-09-16 19:54:00 V1 E13 E 13 2016-09-16 19:13:00 FALSE
## # ... with 21 more rows, and 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-04-28 10:37:00 UTC" "2016-07-13 20:40:00 UTC"
## [3] "2016-07-18 18:38:00 UTC" "2016-09-08 19:27:00 UTC"
## [5] "2016-09-10 06:31:00 UTC" "2016-09-16 19:13:00 UTC"
## [7] "2016-09-26 02:04:00 UTC" "2016-09-26 04:42:00 UTC"
## [9] "2016-09-26 06:02:00 UTC" "2016-09-28 15:01:00 UTC"
## [11] "2016-09-28 20:10:00 UTC" "2016-09-30 23:06:00 UTC"
## [13] "2016-10-20 01:04:00 UTC" "2016-10-20 03:32:00 UTC"
##
## Begin times added
## [1] "2016-06-27 02:54:00 UTC"
##
## End times added:
## [1] "2016-07-18 18:54:00 UTC" "2016-10-20 06:54:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8671 0
## TRUE 0 96
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Las Vegas, NV (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 5
## Issues where state change does not add to the total: 7
## Total issues (may be less than sum due to same datetime in both): 7
##
##
## Records to be addressed include:
## # A tibble: 9 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-29 00:56:00 V1 E09 E 09 2016-04-29 00:09:00 FALSE
## 2 2016-04-29 00:56:00 V2 B29 B 29 2016-04-29 00:29:00 FALSE
## 3 2016-04-30 23:56:00 V1 N56 N 56 2016-04-30 23:56:00 FALSE
## 4 2016-06-28 22:56:00 V1 E51 E 51 2016-06-28 22:51:00 FALSE
## 5 2016-08-04 04:56:00 V1 N56 N 56 2016-08-04 04:56:00 FALSE
## 6 2016-08-04 14:56:00 V1 E03 E 03 2016-08-04 14:03:00 FALSE
## 7 2016-08-27 03:56:00 V1 E53 E 53 2016-08-27 03:53:00 FALSE
## 8 2016-10-24 20:56:00 V1 E19 E 19 2016-10-24 20:19:00 FALSE
## 9 2016-10-24 20:56:00 V2 B36 B 36 2016-10-24 20:36:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-04-29 00:09:00 UTC" "2016-06-28 22:51:00 UTC"
## [3] "2016-08-04 14:03:00 UTC" "2016-08-27 03:53:00 UTC"
## [5] "2016-10-24 20:19:00 UTC"
##
## Begin times added
## [1] "2016-04-30 23:56:00 UTC" "2016-08-04 04:56:00 UTC"
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8783 0
## TRUE 0 35
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Lincoln, NE (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 3
## Issues where state change does not add to the total: 4
## Total issues (may be less than sum due to same datetime in both): 4
##
##
## Records to be addressed include:
## # A tibble: 5 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-24 10:54:00 V1 N54 N 54 2016-04-24 10:54:00 FALSE
## 2 2016-05-23 14:54:00 V1 E31 E 31 2016-05-23 14:31:00 FALSE
## 3 2016-05-23 14:54:00 V2 B44 B 44 2016-05-23 14:44:00 FALSE
## 4 2016-06-18 08:54:00 V1 B44 B 44 2016-06-18 08:44:00 FALSE
## 5 2016-09-16 06:54:00 V1 E0558 E 0558 2016-09-16 05:58:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-05-23 14:31:00 UTC" "2016-06-18 08:44:00 UTC"
## [3] "2016-09-16 05:58:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-04-24 10:54:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8758 0
## TRUE 0 55
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Milwaukee, WI (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 2
## Issues where state change does not add to the total: 8
## Total issues (may be less than sum due to same datetime in both): 8
##
##
## Records to be addressed include:
## # A tibble: 8 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-03-31 17:52:00 V1 N52 N 52 2016-03-31 17:52:00 FALSE
## 2 2016-04-06 08:52:00 V1 E10 E 10 2016-04-06 08:10:00 FALSE
## 3 2016-06-10 03:52:00 V1 N52 N 52 2016-06-10 03:52:00 FALSE
## 4 2016-06-26 06:52:00 V1 N52 N 52 2016-06-26 06:52:00 FALSE
## 5 2016-06-26 11:52:00 V1 N52 N 52 2016-06-26 11:52:00 FALSE
## 6 2016-07-01 04:52:00 V1 E07 E 07 2016-07-01 04:07:00 FALSE
## 7 2016-07-06 11:52:00 V1 N52 N 52 2016-07-06 11:52:00 FALSE
## 8 2016-07-17 13:52:00 V1 N52 N 52 2016-07-17 13:52:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-04-06 08:10:00 UTC" "2016-07-01 04:07:00 UTC"
##
## Begin times added
## [1] "2016-06-10 03:52:00 UTC" "2016-06-26 06:52:00 UTC"
## [3] "2016-06-26 11:52:00 UTC"
##
## End times added:
## [1] "2016-03-31 17:52:00 UTC" "2016-07-06 11:52:00 UTC"
## [3] "2016-07-17 13:52:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8757 0
## TRUE 0 51
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Madison, WI (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 7
## Issues where state change does not add to the total: 9
## Total issues (may be less than sum due to same datetime in both): 9
##
##
## Records to be addressed include:
## # A tibble: 15 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-03-16 07:53:00 V1 N53 N 53 2016-03-16 07:53:00 FALSE
## 2 2016-05-27 10:53:00 V1 E14 E 14 2016-05-27 10:14:00 FALSE
## 3 2016-05-27 10:53:00 V2 B32 B 32 2016-05-27 10:32:00 FALSE
## 4 2016-07-11 04:53:00 V1 E00 E 00 2016-07-11 04:00:00 FALSE
## 5 2016-09-07 12:53:00 V1 E14 E 14 2016-09-07 12:14:00 FALSE
## 6 2016-09-13 08:53:00 V1 E22 E 22 2016-09-13 08:22:00 FALSE
## 7 2016-09-13 08:53:00 V2 B38 B 38 2016-09-13 08:38:00 FALSE
## 8 2016-09-13 08:53:00 V3 E53 E 53 2016-09-13 08:53:00 FALSE
## 9 2016-09-16 11:53:00 V1 E20 E 20 2016-09-16 11:20:00 FALSE
## 10 2016-09-19 21:53:00 V1 E45 E 45 2016-09-19 21:45:00 FALSE
## 11 2016-09-20 00:53:00 V1 B07 B 07 2016-09-20 00:07:00 FALSE
## 12 2016-10-16 04:53:00 V1 B0358 B 0358 2016-10-16 03:58:00 FALSE
## 13 2016-10-16 04:53:00 V2 E01 E 01 2016-10-16 04:01:00 FALSE
## 14 2016-10-16 04:53:00 V3 E03 E 03 2016-10-16 04:03:00 FALSE
## 15 2016-10-16 04:53:00 V4 B15 B 15 2016-10-16 04:15:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-05-27 10:14:00 UTC" "2016-07-11 04:00:00 UTC"
## [3] "2016-09-07 12:14:00 UTC" "2016-09-13 08:22:00 UTC"
## [5] "2016-09-16 11:20:00 UTC" "2016-09-19 21:45:00 UTC"
## [7] "2016-10-16 04:03:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-03-16 07:53:00 UTC" "2016-09-20 00:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8735 0
## TRUE 0 63
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Minneapolis, MN (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime deltaMETAR
## <dttm> <chr> <chr> <chr> <chr> <dttm> <drtn>
## 1 2016-09-06 08:53:00 V1 B0735 B 0735 2016-09-06 07:35:00 4680 secs
## # ... with 2 more variables: deltaPrev <dbl>, issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 6
## Issues where state change does not add to the total: 6
## Total issues (may be less than sum due to same datetime in both): 6
##
##
## Records to be addressed include:
## # A tibble: 14 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-07-10 10:53:00 V1 E20 E 20 2016-07-10 10:20:00 FALSE
## 2 2016-07-10 10:53:00 V2 B26 B 26 2016-07-10 10:26:00 FALSE
## 3 2016-07-23 15:53:00 V1 E28 E 28 2016-07-23 15:28:00 FALSE
## 4 2016-07-23 15:53:00 V2 B34 B 34 2016-07-23 15:34:00 FALSE
## 5 2016-08-11 04:53:00 V1 E0359 E 0359 2016-08-11 03:59:00 FALSE
## 6 2016-08-11 04:53:00 V2 B02 B 02 2016-08-11 04:02:00 FALSE
## 7 2016-08-11 04:53:00 V3 E50 E 50 2016-08-11 04:50:00 FALSE
## 8 2016-08-16 21:53:00 V1 E09 E 09 2016-08-16 21:09:00 FALSE
## 9 2016-08-16 21:53:00 V2 B21 B 21 2016-08-16 21:21:00 FALSE
## 10 2016-09-22 02:53:00 V1 E48 E 48 2016-09-22 02:48:00 FALSE
## 11 2016-09-22 02:53:00 V2 E49 E 49 2016-09-22 02:49:00 FALSE
## 12 2016-10-18 04:53:00 V1 E03 E 03 2016-10-18 04:03:00 FALSE
## 13 2016-10-18 04:53:00 V2 B05 B 05 2016-10-18 04:05:00 FALSE
## 14 2016-10-18 04:53:00 V3 E20 E 20 2016-10-18 04:20:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-07-10 10:20:00 UTC" "2016-07-23 15:28:00 UTC"
## [3] "2016-08-11 03:59:00 UTC" "2016-08-16 21:09:00 UTC"
## [5] "2016-09-22 02:49:00 UTC" "2016-10-18 04:03:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8763 0
## TRUE 0 54
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: New Orleans, LA (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 14
## Issues where state change does not add to the total: 18
## Total issues (may be less than sum due to same datetime in both): 18
##
##
## Records to be addressed include:
## # A tibble: 27 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-03-11 18:53:00 V1 N53 N 53 2016-03-11 18:53:00 FALSE
## 2 2016-03-11 19:53:00 V1 N53 N 53 2016-03-11 19:53:00 FALSE
## 3 2016-04-13 11:53:00 V1 E14 E 14 2016-04-13 11:14:00 FALSE
## 4 2016-04-13 11:53:00 V2 B37 B 37 2016-04-13 11:37:00 FALSE
## 5 2016-04-13 11:53:00 V3 B50 B 50 2016-04-13 11:50:00 FALSE
## 6 2016-06-03 22:53:00 V1 B40 B 40 2016-06-03 22:40:00 FALSE
## 7 2016-06-28 21:53:00 V1 B2056 B 2056 2016-06-28 20:56:00 FALSE
## 8 2016-06-28 21:53:00 V2 E22 E 22 2016-06-28 21:22:00 FALSE
## 9 2016-07-13 18:53:00 V1 N53 N 53 2016-07-13 18:53:00 FALSE
## 10 2016-09-02 02:53:00 V1 E31 E 31 2016-09-02 02:31:00 FALSE
## # ... with 17 more rows, and 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-04-13 11:50:00 UTC" "2016-06-03 22:40:00 UTC"
## [3] "2016-06-28 20:56:00 UTC" "2016-09-02 02:31:00 UTC"
## [5] "2016-09-03 18:58:00 UTC" "2016-09-04 14:05:00 UTC"
## [7] "2016-09-04 19:50:00 UTC" "2016-09-05 14:03:00 UTC"
## [9] "2016-09-06 20:44:00 UTC" "2016-09-10 19:31:00 UTC"
## [11] "2016-09-15 20:19:00 UTC" "2016-09-27 00:45:00 UTC"
## [13] "2016-12-06 00:57:00 UTC" "2016-12-06 02:02:00 UTC"
##
## Begin times added
## [1] "2016-03-11 18:53:00 UTC" "2016-07-13 18:53:00 UTC"
##
## End times added:
## [1] "2016-03-11 19:53:00 UTC" "2016-09-11 21:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8606 0
## TRUE 0 207
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Chicago, IL (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 4
## Issues where state change does not add to the total: 6
## Total issues (may be less than sum due to same datetime in both): 6
##
##
## Records to be addressed include:
## # A tibble: 7 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-08-28 08:51:00 V1 B25 B 25 2016-08-28 08:25:00 FALSE
## 2 2016-08-28 09:51:00 V1 E10 E 10 2016-08-28 09:10:00 FALSE
## 3 2016-09-21 17:51:00 V1 B44 B 44 2016-09-21 17:44:00 FALSE
## 4 2016-09-21 18:51:00 V1 E1759 E 1759 2016-09-21 17:59:00 FALSE
## 5 2016-10-06 21:51:00 V1 E31 E 31 2016-10-06 21:31:00 FALSE
## 6 2016-10-12 17:51:00 V1 E33 E 33 2016-10-12 17:33:00 FALSE
## 7 2016-10-12 17:51:00 V2 B36 B 36 2016-10-12 17:36:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-08-28 09:10:00 UTC" "2016-09-21 17:59:00 UTC"
## [3] "2016-10-06 21:31:00 UTC" "2016-10-12 17:33:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-08-28 08:51:00 UTC" "2016-09-21 17:51:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8749 0
## TRUE 0 66
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: San Diego, CA (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 0
## Total issues (may be less than sum due to same datetime in both): 0
##
##
## Records to be addressed include:
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, lastRecord <lgl>, lagPrecip <lgl>,
## # curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8808 0
## TRUE 0 2
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Traverse City, MI (2016)
##
## Regex search code is: ((?<!VC)TS[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 5
## Issues where state change does not add to the total: 6
## Total issues (may be less than sum due to same datetime in both): 6
##
##
## Records to be addressed include:
## # A tibble: 8 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-03-16 12:53:00 V1 N53 N 53 2016-03-16 12:53:00 FALSE
## 2 2016-06-11 02:53:00 V1 E21 E 21 2016-06-11 02:21:00 FALSE
## 3 2016-06-11 02:53:00 V2 E31 E 31 2016-06-11 02:31:00 FALSE
## 4 2016-07-14 05:53:00 V1 E35 E 35 2016-07-14 05:35:00 FALSE
## 5 2016-07-14 05:53:00 V2 B47 B 47 2016-07-14 05:47:00 FALSE
## 6 2016-08-17 07:53:00 V1 E40 E 40 2016-08-17 07:40:00 FALSE
## 7 2016-08-24 11:53:00 V1 E43 E 43 2016-08-24 11:43:00 FALSE
## 8 2016-09-26 02:53:00 V1 E52 E 52 2016-09-26 02:52:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-06-11 02:31:00 UTC" "2016-07-14 05:35:00 UTC"
## [3] "2016-08-17 07:40:00 UTC" "2016-08-24 11:43:00 UTC"
## [5] "2016-09-26 02:52:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-03-16 12:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8781 0
## TRUE 0 33
##
## Full matches between METAR observations and intervals
The thunder data parsed successfully, with all intervals of the proper length. Notably, there were proportionally many more changes to the begin times and end times for thunder than with rain or snow, suggesting that tracking TSBnnEnn may be less precise than tracking RABnnEnn and SNBnnEnn.
Next steps are to create and plot metrics by precipitation type across locales.
There are several areas of interest for precipitation:
The first function takes a list and returns total precipitation length and total precipitation events by locale and month (overall totals can be summed from this):
# Extract precipitation information (total hours, number of events) by locale and month
precipByLocale <- function(lst,
listItem="beginEndInterval",
subName=-4,
intervalVar="precipInts",
minTime=ymd("2016-01-01"),
maxTime=ymd("2016-12-31"),
mapper=cityNameMapper
) {
# FUNCTION ARGUMENTS:
# lst - the list containing the output for the specified precipitation type
# listItem - the name of the item to extract from each element of the list
# subName - the second argument for str_sub() for converting list item name to original file name
# intervalVar - name of the interval variable in listItem
# minTime - ignore any interval with a start time before this date
# maxTime - ignore any interval with a start time after this date
# mapper - named vector for mapping source to descriptive locale
# Get the names of the objects in the list
beNames <- names(lst) %>% str_sub(1, subName)
# Get the interval data by locale and apply the original file names
beData <- sapply(lst, FUN="[", listItem)
names(beData) <- beNames
# Get the total precipitation length and total precipitation events by month
# Exclude any event that begins before beginDate or after afterDate
# Count the precipitation as being in the month when it begins
# Helper function for extraction of total precipitation by month
getMonthly <- function(x, y=intervalVar, minT=minTime, maxT=maxTime) {
x %>%
mutate(intStart=lubridate::int_start(get(intervalVar)),
intHours=lubridate::time_length(get(intervalVar))/3600,
month=factor(month.abb[lubridate::month(intStart)], levels=month.abb)
) %>%
filter(intStart >= minTime, intStart <= maxTime) %>%
group_by(month) %>%
summarize(hours=sum(intHours), events=n()) %>%
right_join(tibble::tibble(month=factor(month.abb, levels=month.abb)), by="month") %>%
mutate(hours=ifelse(is.na(hours), 0, hours),
events=ifelse(is.na(events), 0 , events)
) %>%
select(month, hours, events) %>%
pivot_longer(-month, names_to="variable", values_to="value")
}
# Extract total hours and events by month
monPrecip <- purrr::map_dfr(beData, .f=getMonthly, .id="source")
# Pivot wider so that file is unique by source-month with hours and events are columns
# Add locale name from mapper
monPrecip %>%
pivot_wider(c(source, month), names_from="variable", values_from="value") %>%
mutate(locale=mapper[source]) %>%
select(source, locale, everything())
}
The function can then be run on thunderList:
thunderLength <- precipByLocale(thunderList)
thunderLength
## # A tibble: 180 x 5
## source locale month hours events
## <chr> <chr> <fct> <dbl> <dbl>
## 1 kdtw_2016 Detroit, MI (2016) Jan 0 0
## 2 kdtw_2016 Detroit, MI (2016) Feb 1.48 2
## 3 kdtw_2016 Detroit, MI (2016) Mar 5.93 5
## 4 kdtw_2016 Detroit, MI (2016) Apr 4.05 3
## 5 kdtw_2016 Detroit, MI (2016) May 1.83 2
## 6 kdtw_2016 Detroit, MI (2016) Jun 1.68 2
## 7 kdtw_2016 Detroit, MI (2016) Jul 7.23 6
## 8 kdtw_2016 Detroit, MI (2016) Aug 6.38 10
## 9 kdtw_2016 Detroit, MI (2016) Sep 1.70 3
## 10 kdtw_2016 Detroit, MI (2016) Oct 0.916 2
## # ... with 170 more rows
A function can then be written to create four types of plots:
plotPrecipitation <- function(tbl,
pTypeName=""
) {
# FUNCTION ARGUMENTS:
# tbl: the tibble containing the processed data
# pTypeName: the type of precipitation as a character
# Plot 1: Total precipitation length by locale
p1 <- tbl %>%
group_by(locale) %>%
summarize(hours=sum(hours)) %>%
ggplot(aes(x=fct_reorder(locale, hours), y=hours)) +
geom_col(fill="lightblue") +
geom_text(aes(label=round(hours), y=hours/2)) +
coord_flip() +
labs(x="",
y="Hours",
title=paste0("Total hours of ", stringr::str_to_lower(pTypeName), " by locale")
)
print(p1)
# Plot 2: Total precipitation events by locale
p2 <- tbl %>%
group_by(locale) %>%
summarize(events=sum(events)) %>%
ggplot(aes(x=fct_reorder(locale, events), y=events)) +
geom_col(fill="lightblue") +
geom_text(aes(label=round(events), y=events/2)) +
coord_flip() +
labs(x="",
y="# Unique Events",
title=paste0("Total unique events of ", stringr::str_to_lower(pTypeName), " by locale"),
caption="An event is defined by a begin and end and may be as short as a minute\nor with as little as a minute gap to the next event"
)
print(p2)
# Plot 3: Precipitation length by month, facetted by locale
mm <- tbl %>%
group_by(month) %>%
summarize(hours=mean(hours))
p3 <- tbl %>%
ggplot(aes(x=month, y=hours)) +
geom_col(fill="lightblue") +
labs(y="Hours",
x="",
title=paste0("Total hours of ", stringr::str_to_lower(pTypeName), " by month"),
subtitle="Red dots are the monthly average across locales in this plot"
) +
geom_point(data=mm, color="red") +
facet_wrap(~ locale) +
theme(axis.text.x=element_text(angle=90))
print(p3)
# Plot 4: Precipitation events by month, facetted by locale
mm <- tbl %>%
group_by(month) %>%
summarize(events=mean(events))
p4 <- tbl %>%
ggplot(aes(x=month, y=events)) +
geom_col(fill="lightblue") +
labs(y="# Unique Events",
x="",
title=paste0("Total unique events of ", stringr::str_to_lower(pTypeName), " by month"),
subtitle="Red dots are the monthly average across locales in this plot",
caption="An event is defined by a begin and end and may be as short as a minute\nor with as little as a minute gap to the next event"
) +
geom_point(data=mm, color="red") +
facet_wrap(~ locale) +
theme(axis.text.x=element_text(angle=90))
print(p4)
}
The function is then run for the thunder data:
plotPrecipitation(thunderLength, pTypeName="thunder")
Processing is then run for the rain and snow data:
fileNames <- grep(pattern="2016", names(cityNameMapper), value=TRUE)
rainList <- vector("list", length(fileNames))
names(rainList) <- paste0(fileNames, "_RA")
for (city in fileNames) {
cat("\n\n*** Processing for:", cityNameMapper[city], "\n")
rainList[[paste0(city, "_RA")]] <- combinePrecipFunctions(get(city),
pType="(?<!FZ)RA",
errorLengthInconsistency=TRUE
)
}
##
##
## *** Processing for: Detroit, MI (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 3
## Issues where state change does not add to the total: 7
## Total issues (may be less than sum due to same datetime in both): 7
##
##
## Records to be addressed include:
## # A tibble: 9 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-02-24 11:53:00 V1 N53 N 53 2016-02-24 11:53:00 FALSE
## 2 2016-04-01 00:53:00 V1 B01 B 01 2016-04-01 00:01:00 FALSE
## 3 2016-04-01 00:53:00 V2 E12 E 12 2016-04-01 00:12:00 FALSE
## 4 2016-04-07 22:53:00 V1 N53 N 53 2016-04-07 22:53:00 FALSE
## 5 2016-04-07 23:53:00 V1 N53 N 53 2016-04-07 23:53:00 FALSE
## 6 2016-07-14 00:53:00 V1 E30 E 30 2016-07-14 00:30:00 FALSE
## 7 2016-08-15 00:53:00 V1 B2355 B 2355 2016-08-14 23:55:00 FALSE
## 8 2016-08-15 00:53:00 V2 E29 E 29 2016-08-15 00:29:00 FALSE
## 9 2016-08-16 14:53:00 V1 N53 N 53 2016-08-16 14:53:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-04-01 00:01:00 UTC" "2016-07-14 00:30:00 UTC"
## [3] "2016-08-14 23:55:00 UTC"
##
## Begin times added
## [1] "2016-02-24 11:53:00 UTC" "2016-04-07 23:53:00 UTC"
##
## End times added:
## [1] "2016-04-07 22:53:00 UTC" "2016-08-16 14:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8258 0
## TRUE 0 560
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Newark, NJ (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 2
## Issues where state change does not add to the total: 2
## Total issues (may be less than sum due to same datetime in both): 2
##
##
## Records to be addressed include:
## # A tibble: 2 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-02-24 19:51:00 V1 B1858 B 1858 2016-02-24 18:58:00 FALSE
## 2 2016-11-20 08:51:00 V1 E0758 E 0758 2016-11-20 07:58:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-02-24 18:58:00 UTC" "2016-11-20 07:58:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8195 0
## TRUE 0 626
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Green Bay, WI (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 1
## Total issues (may be less than sum due to same datetime in both): 1
##
##
## Records to be addressed include:
## # A tibble: 2 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-05-02 17:53:00 V1 E1659 E 1659 2016-05-02 16:59:00 FALSE
## 2 2016-05-02 17:53:00 V2 B39 B 39 2016-05-02 17:39:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-05-02 17:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8378 0
## TRUE 0 425
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Grand Rapids, MI (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 2
## Issues where state change does not add to the total: 5
## Total issues (may be less than sum due to same datetime in both): 5
##
##
## Records to be addressed include:
## # A tibble: 5 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-07 12:53:00 V1 E15 E 15 2016-04-07 12:15:00 FALSE
## 2 2016-06-26 13:53:00 V1 B46 B 46 2016-06-26 13:46:00 FALSE
## 3 2016-06-26 14:53:00 V1 N53 N 53 2016-06-26 14:53:00 FALSE
## 4 2016-07-24 15:53:00 V1 N53 N 53 2016-07-24 15:53:00 FALSE
## 5 2016-08-12 09:53:00 V1 B0854 B 0854 2016-08-12 08:54:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-04-07 12:15:00 UTC" "2016-08-12 08:54:00 UTC"
##
## Begin times added
## [1] "2016-06-26 14:53:00 UTC"
##
## End times added:
## [1] "2016-06-26 13:53:00 UTC" "2016-07-24 15:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8208 0
## TRUE 0 604
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Houston, TX (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime deltaMETAR
## <dttm> <chr> <chr> <chr> <chr> <dttm> <drtn>
## 1 2016-03-23 22:53:00 V1 B2145 B 2145 2016-03-23 21:45:00 4080 secs
## # ... with 2 more variables: deltaPrev <dbl>, issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 3
## Issues where state change does not add to the total: 7
## Total issues (may be less than sum due to same datetime in both): 7
##
##
## Records to be addressed include:
## # A tibble: 9 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-03-23 22:53:00 V2 E16 E 16 2016-03-23 22:16:00 FALSE
## 2 2016-06-06 00:53:00 V1 E40 E 40 2016-06-06 00:40:00 FALSE
## 3 2016-08-14 00:53:00 V1 N53 N 53 2016-08-14 00:53:00 FALSE
## 4 2016-08-14 22:53:00 V1 N53 N 53 2016-08-14 22:53:00 FALSE
## 5 2016-08-21 19:53:00 V1 B1857 B 1857 2016-08-21 18:57:00 FALSE
## 6 2016-08-21 19:53:00 V2 E27 E 27 2016-08-21 19:27:00 FALSE
## 7 2016-08-21 19:53:00 V3 B45 B 45 2016-08-21 19:45:00 FALSE
## 8 2016-12-03 18:53:00 V1 E12 E 12 2016-12-03 18:12:00 FALSE
## 9 2016-12-04 10:53:00 V1 N53 N 53 2016-12-04 10:53:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-03-23 22:16:00 UTC" "2016-06-06 00:40:00 UTC"
## [3] "2016-12-03 18:12:00 UTC"
##
## Begin times added
## [1] "2016-08-14 00:53:00 UTC" "2016-08-14 22:53:00 UTC"
##
## End times added:
## [1] "2016-08-21 19:53:00 UTC" "2016-12-04 10:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8368 0
## TRUE 0 448
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Indianapolis, IN (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 5
## Total issues (may be less than sum due to same datetime in both): 5
##
##
## Records to be addressed include:
## # A tibble: 6 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-07 23:54:00 V1 N54 N 54 2016-04-07 23:54:00 FALSE
## 2 2016-05-08 19:54:00 V1 N54 N 54 2016-05-08 19:54:00 FALSE
## 3 2016-07-18 17:54:00 V1 N54 N 54 2016-07-18 17:54:00 FALSE
## 4 2016-08-07 19:54:00 V1 N54 N 54 2016-08-07 19:54:00 FALSE
## 5 2016-09-05 21:54:00 V1 B01 B 01 2016-09-05 21:01:00 FALSE
## 6 2016-09-05 21:54:00 V2 E38 E 38 2016-09-05 21:38:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-09-05 21:01:00 UTC"
##
## Begin times added
## [1] "2016-07-18 17:54:00 UTC"
##
## End times added:
## [1] "2016-04-07 23:54:00 UTC" "2016-05-08 19:54:00 UTC"
## [3] "2016-08-07 19:54:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8144 0
## TRUE 0 623
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Las Vegas, NV (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 1
## Total issues (may be less than sum due to same datetime in both): 1
##
##
## Records to be addressed include:
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-30 23:56:00 V1 N56 N 56 2016-04-30 23:56:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## [1] "2016-04-30 23:56:00 UTC"
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8710 0
## TRUE 0 108
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Lincoln, NE (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 3
## Total issues (may be less than sum due to same datetime in both): 3
##
##
## Records to be addressed include:
## # A tibble: 3 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-07-13 18:54:00 V1 N54 N 54 2016-07-13 18:54:00 FALSE
## 2 2016-08-29 22:54:00 V1 N54 N 54 2016-08-29 22:54:00 FALSE
## 3 2016-09-16 06:54:00 V1 N54 N 54 2016-09-16 06:54:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## [1] "2016-08-29 22:54:00 UTC" "2016-09-16 06:54:00 UTC"
##
## End times added:
## [1] "2016-07-13 18:54:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8500 0
## TRUE 0 313
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Milwaukee, WI (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 4
## Total issues (may be less than sum due to same datetime in both): 4
##
##
## Records to be addressed include:
## # A tibble: 4 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-10 10:52:00 V1 N52 N 52 2016-04-10 10:52:00 FALSE
## 2 2016-06-10 03:52:00 V1 N52 N 52 2016-06-10 03:52:00 FALSE
## 3 2016-08-20 13:52:00 V1 N52 N 52 2016-08-20 13:52:00 FALSE
## 4 2016-10-07 02:52:00 V1 N52 N 52 2016-10-07 02:52:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## [1] "2016-04-10 10:52:00 UTC" "2016-06-10 03:52:00 UTC"
## [3] "2016-08-20 13:52:00 UTC" "2016-10-07 02:52:00 UTC"
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8346 0
## TRUE 0 462
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Madison, WI (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 2
## Total issues (may be less than sum due to same datetime in both): 2
##
##
## Records to be addressed include:
## # A tibble: 2 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-09-07 12:53:00 V1 E52 E 52 2016-09-07 12:52:00 FALSE
## 2 2016-09-16 11:53:00 V1 N53 N 53 2016-09-16 11:53:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-09-07 12:52:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-09-16 11:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8347 0
## TRUE 0 451
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Minneapolis, MN (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime deltaMETAR
## <dttm> <chr> <chr> <chr> <chr> <dttm> <drtn>
## 1 2016-09-06 08:53:00 V1 B0744 B 0744 2016-09-06 07:44:00 4140 secs
## # ... with 2 more variables: deltaPrev <dbl>, issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 2
## Issues where state change does not add to the total: 8
## Total issues (may be less than sum due to same datetime in both): 8
##
##
## Records to be addressed include:
## # A tibble: 9 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-04-05 15:53:00 V1 E22 E 22 2016-04-05 15:22:00 FALSE
## 2 2016-05-29 19:53:00 V1 B31 B 31 2016-05-29 19:31:00 FALSE
## 3 2016-06-20 04:53:00 V1 B11 B 11 2016-06-20 04:11:00 FALSE
## 4 2016-09-06 07:53:00 V1 N53 N 53 2016-09-06 07:53:00 FALSE
## 5 2016-09-06 08:53:00 V2 E01 E 01 2016-09-06 08:01:00 FALSE
## 6 2016-09-06 08:53:00 V3 B17 B 17 2016-09-06 08:17:00 FALSE
## 7 2016-10-18 04:53:00 V1 E35 E 35 2016-10-18 04:35:00 FALSE
## 8 2016-11-22 13:53:00 V1 N53 N 53 2016-11-22 13:53:00 FALSE
## 9 2016-11-29 07:53:00 V1 B18 B 18 2016-11-29 07:18:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-10-18 04:35:00 UTC" "2016-11-29 07:18:00 UTC"
##
## Begin times added
## [1] "2016-04-05 15:53:00 UTC" "2016-09-06 07:53:00 UTC"
##
## End times added:
## [1] "2016-05-29 19:53:00 UTC" "2016-06-20 04:53:00 UTC"
## [3] "2016-09-06 08:53:00 UTC" "2016-11-22 13:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8390 0
## TRUE 0 427
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: New Orleans, LA (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime deltaMETAR
## <dttm> <chr> <chr> <chr> <chr> <dttm> <drtn>
## 1 2017-01-01 09:53:00 V1 E0852 E 0852 2017-01-01 08:52:00 3660 secs
## # ... with 2 more variables: deltaPrev <dbl>, issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 2
## Issues where state change does not add to the total: 4
## Total issues (may be less than sum due to same datetime in both): 4
##
##
## Records to be addressed include:
## # A tibble: 5 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-03-11 20:53:00 V1 B07 B 07 2016-03-11 20:07:00 FALSE
## 2 2016-03-27 16:53:00 V1 N53 N 53 2016-03-27 16:53:00 FALSE
## 3 2016-08-14 23:53:00 V1 B39 B 39 2016-08-14 23:39:00 FALSE
## 4 2016-12-13 23:53:00 V1 E2258 E 2258 2016-12-13 22:58:00 FALSE
## 5 2016-12-13 23:53:00 V2 B30 B 30 2016-12-13 23:30:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Adding final end time to cap interval at end of file: 2017-01-01 23:53:01
##
## Start/end times deleted:
## [1] "2016-03-11 20:07:00 UTC" "2016-12-13 22:58:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-03-27 16:53:00 UTC" "2016-08-14 23:53:00 UTC"
## [3] "2017-01-01 23:53:01 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8375 0
## TRUE 0 438
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Chicago, IL (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 3
## Total issues (may be less than sum due to same datetime in both): 3
##
##
## Records to be addressed include:
## # A tibble: 3 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-03-24 21:51:00 V1 N51 N 51 2016-03-24 21:51:00 FALSE
## 2 2016-07-13 18:51:00 V1 N51 N 51 2016-07-13 18:51:00 FALSE
## 3 2016-08-29 19:51:00 V1 N51 N 51 2016-08-29 19:51:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## [1] "2016-03-24 21:51:00 UTC" "2016-08-29 19:51:00 UTC"
##
## End times added:
## [1] "2016-07-13 18:51:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8349 0
## TRUE 0 466
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: San Diego, CA (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 0
## Total issues (may be less than sum due to same datetime in both): 0
##
##
## Records to be addressed include:
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, lastRecord <lgl>, lagPrecip <lgl>,
## # curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8612 0
## TRUE 0 198
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Traverse City, MI (2016)
##
## Regex search code is: ((?<!FZ)RA[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 3
## Issues where state change does not add to the total: 5
## Total issues (may be less than sum due to same datetime in both): 5
##
##
## Records to be addressed include:
## # A tibble: 7 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-03-28 10:53:00 V1 E0957 E 0957 2016-03-28 09:57:00 FALSE
## 2 2016-06-06 20:53:00 V1 B39 B 39 2016-06-06 20:39:00 FALSE
## 3 2016-06-06 20:53:00 V2 E49 E 49 2016-06-06 20:49:00 FALSE
## 4 2016-06-06 21:53:00 V1 B17 B 17 2016-06-06 21:17:00 FALSE
## 5 2016-06-06 21:53:00 V2 E38 E 38 2016-06-06 21:38:00 FALSE
## 6 2016-09-28 19:53:00 V1 N53 N 53 2016-09-28 19:53:00 FALSE
## 7 2016-12-02 16:53:00 V1 E29 E 29 2016-12-02 16:29:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-03-28 09:57:00 UTC" "2016-06-06 21:17:00 UTC"
## [3] "2016-12-02 16:29:00 UTC"
##
## Begin times added
## [1] "2016-06-06 20:53:00 UTC"
##
## End times added:
## [1] "2016-09-28 19:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8345 0
## TRUE 0 469
##
## Full matches between METAR observations and intervals
fileNames <- grep(pattern="2016", names(cityNameMapper), value=TRUE)
snowList <- vector("list", length(fileNames))
names(snowList) <- paste0(fileNames, "_SN")
for (city in fileNames) {
cat("\n\n*** Processing for:", cityNameMapper[city], "\n")
snowList[[paste0(city, "_SN")]] <- combinePrecipFunctions(get(city),
pType="(?<!BL)SN",
errorLengthInconsistency=TRUE
)
}
##
##
## *** Processing for: Detroit, MI (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 2
## Total issues (may be less than sum due to same datetime in both): 2
##
##
## Records to be addressed include:
## # A tibble: 2 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-02-09 10:53:00 V1 N53 N 53 2016-02-09 10:53:00 FALSE
## 2 2016-02-24 11:53:00 V1 B12 B 12 2016-02-24 11:12:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## [1] "2016-02-09 10:53:00 UTC"
##
## End times added:
## [1] "2016-02-24 11:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8344 0
## TRUE 0 474
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Newark, NJ (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 3
## Total issues (may be less than sum due to same datetime in both): 3
##
##
## Records to be addressed include:
## # A tibble: 3 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-02-10 14:51:00 V1 N51 N 51 2016-02-10 14:51:00 FALSE
## 2 2016-11-20 07:51:00 V1 N51 N 51 2016-11-20 07:51:00 FALSE
## 3 2016-11-20 08:51:00 V1 E0756 E 0756 2016-11-20 07:56:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-11-20 07:56:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-02-10 14:51:00 UTC" "2016-11-20 07:51:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8683 0
## TRUE 0 138
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Green Bay, WI (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 2
## Total issues (may be less than sum due to same datetime in both): 2
##
##
## Records to be addressed include:
## # A tibble: 3 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-12-07 15:53:00 V1 E09 E 09 2016-12-07 15:09:00 FALSE
## 2 2016-12-07 15:53:00 V2 B38 B 38 2016-12-07 15:38:00 FALSE
## 3 2016-12-29 17:53:00 V1 N53 N 53 2016-12-29 17:53:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-12-07 15:09:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-12-29 17:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8460 0
## TRUE 0 343
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Grand Rapids, MI (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 3
## Total issues (may be less than sum due to same datetime in both): 3
##
##
## Records to be addressed include:
## # A tibble: 3 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-02-04 20:53:00 V1 N53 N 53 2016-02-04 20:53:00 FALSE
## 2 2016-04-02 16:53:00 V1 N53 N 53 2016-04-02 16:53:00 FALSE
## 3 2016-04-02 17:53:00 V1 N53 N 53 2016-04-02 17:53:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## [1] "2016-02-04 20:53:00 UTC" "2016-04-02 17:53:00 UTC"
##
## End times added:
## [1] "2016-04-02 16:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8137 0
## TRUE 0 675
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Houston, TX (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 0
## Total issues (may be less than sum due to same datetime in both): 0
##
##
## Records to be addressed include:
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, lastRecord <lgl>, lagPrecip <lgl>,
## # curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE
## FALSE 8816
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Indianapolis, IN (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 0
## Total issues (may be less than sum due to same datetime in both): 0
##
##
## Records to be addressed include:
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, lastRecord <lgl>, lagPrecip <lgl>,
## # curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8529 0
## TRUE 0 238
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Las Vegas, NV (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 0
## Total issues (may be less than sum due to same datetime in both): 0
##
##
## Records to be addressed include:
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, lastRecord <lgl>, lagPrecip <lgl>,
## # curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE
## FALSE 8818
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Lincoln, NE (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 1
## Total issues (may be less than sum due to same datetime in both): 1
##
##
## Records to be addressed include:
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-01-19 12:54:00 V1 B1155 B 1155 2016-01-19 11:55:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-01-19 11:55:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8719 0
## TRUE 0 94
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Milwaukee, WI (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 1
## Total issues (may be less than sum due to same datetime in both): 1
##
##
## Records to be addressed include:
## # A tibble: 1 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-01-26 01:52:00 V1 E39 E 39 2016-01-26 01:39:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-01-26 01:39:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8462 0
## TRUE 0 346
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Madison, WI (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 3
## Issues where state change does not add to the total: 6
## Total issues (may be less than sum due to same datetime in both): 6
##
##
## Records to be addressed include:
## # A tibble: 6 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-01-09 14:53:00 V1 E39 E 39 2016-01-09 14:39:00 FALSE
## 2 2016-01-12 12:53:00 V1 N53 N 53 2016-01-12 12:53:00 FALSE
## 3 2016-02-09 04:53:00 V1 N53 N 53 2016-02-09 04:53:00 FALSE
## 4 2016-02-09 05:53:00 V1 N53 N 53 2016-02-09 05:53:00 FALSE
## 5 2016-04-08 14:53:00 V1 E36 E 36 2016-04-08 14:36:00 FALSE
## 6 2016-12-12 05:53:00 V1 E01 E 01 2016-12-12 05:01:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-01-09 14:39:00 UTC" "2016-04-08 14:36:00 UTC"
## [3] "2016-12-12 05:01:00 UTC"
##
## Begin times added
## [1] "2016-02-09 05:53:00 UTC"
##
## End times added:
## [1] "2016-01-12 12:53:00 UTC" "2016-02-09 04:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8435 0
## TRUE 0 363
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Minneapolis, MN (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 4
## Total issues (may be less than sum due to same datetime in both): 4
##
##
## Records to be addressed include:
## # A tibble: 4 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-01-12 00:53:00 V1 N53 N 53 2016-01-12 00:53:00 FALSE
## 2 2016-12-07 14:53:00 V1 N53 N 53 2016-12-07 14:53:00 FALSE
## 3 2016-12-07 15:53:00 V1 N53 N 53 2016-12-07 15:53:00 FALSE
## 4 2016-12-08 22:53:00 V1 N53 N 53 2016-12-08 22:53:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## [1] "2016-01-12 00:53:00 UTC" "2016-12-07 15:53:00 UTC"
## [3] "2016-12-08 22:53:00 UTC"
##
## End times added:
## [1] "2016-12-07 14:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8438 0
## TRUE 0 379
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: New Orleans, LA (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 0
## Total issues (may be less than sum due to same datetime in both): 0
##
##
## Records to be addressed include:
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, lastRecord <lgl>, lagPrecip <lgl>,
## # curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE
## FALSE 8813
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Chicago, IL (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 5
## Total issues (may be less than sum due to same datetime in both): 5
##
##
## Records to be addressed include:
## # A tibble: 5 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-02-10 00:51:00 V1 N51 N 51 2016-02-10 00:51:00 FALSE
## 2 2016-03-24 21:51:00 V1 N51 N 51 2016-03-24 21:51:00 FALSE
## 3 2016-04-08 22:51:00 V1 N51 N 51 2016-04-08 22:51:00 FALSE
## 4 2016-04-09 04:51:00 V1 E46 E 46 2016-04-09 04:46:00 FALSE
## 5 2016-12-24 01:51:00 V1 N51 N 51 2016-12-24 01:51:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-04-09 04:46:00 UTC"
##
## Begin times added
## [1] "2016-02-10 00:51:00 UTC" "2016-03-24 21:51:00 UTC"
## [3] "2016-04-08 22:51:00 UTC" "2016-12-24 01:51:00 UTC"
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 8532 0
## TRUE 0 283
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: San Diego, CA (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 0
## Issues where state change does not add to the total: 0
## Total issues (may be less than sum due to same datetime in both): 0
##
##
## Records to be addressed include:
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, lastRecord <lgl>, lagPrecip <lgl>,
## # curPrecip <lgl>
##
## Start/end times deleted:
## POSIXct of length 0
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## POSIXct of length 0
## intMETAR
## precipMETAR FALSE
## FALSE 8810
##
## Full matches between METAR observations and intervals
##
##
## *** Processing for: Traverse City, MI (2016)
##
## Regex search code is: ((?<!BL)SN[B|E]\d+[0-9BE]*)
##
## Continuity or consistency error - record(s) will be deleted
## # A tibble: 0 x 9
## # ... with 9 variables: dtime <dttm>, name <chr>, value <chr>, chgType <chr>,
## # chgNum <chr>, chgTime <dttm>, deltaMETAR <drtn>, deltaPrev <dbl>,
## # issueCons <lgl>
##
##
## Issues with begin when precipitation or end when no precipitation: 1
## Issues where state change does not add to the total: 2
## Total issues (may be less than sum due to same datetime in both): 2
##
##
## Records to be addressed include:
## # A tibble: 2 x 9
## dtime name value chgType chgNum chgTime lastRecord
## <dttm> <chr> <chr> <chr> <chr> <dttm> <lgl>
## 1 2016-12-02 16:53:00 V1 E40 E 40 2016-12-02 16:40:00 FALSE
## 2 2016-12-18 06:53:00 V1 N53 N 53 2016-12-18 06:53:00 FALSE
## # ... with 2 more variables: lagPrecip <lgl>, curPrecip <lgl>
##
## Start/end times deleted:
## [1] "2016-12-02 16:40:00 UTC"
##
## Begin times added
## POSIXct of length 0
##
## End times added:
## [1] "2016-12-18 06:53:00 UTC"
## intMETAR
## precipMETAR FALSE TRUE
## FALSE 7828 0
## TRUE 0 986
##
## Full matches between METAR observations and intervals
All precipitation intervals extracted are consistent with the precipitation noted in the METAR.
Plots can then be created for the rain data and snow data:
rainLength <- precipByLocale(rainList)
plotPrecipitation(rainLength, pTypeName="rain")
snowLength <- precipByLocale(snowList)
## Warning: Factor `month` contains implicit NA, consider using
## `forcats::fct_explicit_na`
## Warning: Factor `month` contains implicit NA, consider using
## `forcats::fct_explicit_na`
## Warning: Factor `month` contains implicit NA, consider using
## `forcats::fct_explicit_na`
## Warning: Factor `month` contains implicit NA, consider using
## `forcats::fct_explicit_na`
plotPrecipitation(snowLength, pTypeName="snow")
Findings from the plots include:
RAIN:
SNOW:
THUNDER:
Next steps are to combine all of the precipitation functions, allow for output to be directed to a log file, and save processed EDA and precipitation data for later use.